프로젝트

일반

사용자정보

통계
| 개정판:

hytos / DTI_PID / DTI_PID / Tesseract-OCR / unicharambigs.5.html @ af64ec0d

이력 | 보기 | 이력해설 | 다운로드 (19.9 KB)

1
<!DOCTYPE html>
2
<html lang="en">
3
<head>
4
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
5
<meta name="generator" content="AsciiDoc 8.6.10">
6
<title>UNICHARAMBIGS(5)</title>
7
<style type="text/css">
8
/* Shared CSS for AsciiDoc xhtml11 and html5 backends */
9

10
/* Default font. */
11
body {
12
  font-family: Georgia,serif;
13
}
14

15
/* Title font. */
16
h1, h2, h3, h4, h5, h6,
17
div.title, caption.title,
18
thead, p.table.header,
19
#toctitle,
20
#author, #revnumber, #revdate, #revremark,
21
#footer {
22
  font-family: Arial,Helvetica,sans-serif;
23
}
24

25
body {
26
  margin: 1em 5% 1em 5%;
27
}
28

29
a {
30
  color: blue;
31
  text-decoration: underline;
32
}
33
a:visited {
34
  color: fuchsia;
35
}
36

37
em {
38
  font-style: italic;
39
  color: navy;
40
}
41

42
strong {
43
  font-weight: bold;
44
  color: #083194;
45
}
46

47
h1, h2, h3, h4, h5, h6 {
48
  color: #527bbd;
49
  margin-top: 1.2em;
50
  margin-bottom: 0.5em;
51
  line-height: 1.3;
52
}
53

54
h1, h2, h3 {
55
  border-bottom: 2px solid silver;
56
}
57
h2 {
58
  padding-top: 0.5em;
59
}
60
h3 {
61
  float: left;
62
}
63
h3 + * {
64
  clear: left;
65
}
66
h5 {
67
  font-size: 1.0em;
68
}
69

70
div.sectionbody {
71
  margin-left: 0;
72
}
73

74
hr {
75
  border: 1px solid silver;
76
}
77

78
p {
79
  margin-top: 0.5em;
80
  margin-bottom: 0.5em;
81
}
82

83
ul, ol, li > p {
84
  margin-top: 0;
85
}
86
ul > li     { color: #aaa; }
87
ul > li > * { color: black; }
88

89
.monospaced, code, pre {
90
  font-family: "Courier New", Courier, monospace;
91
  font-size: inherit;
92
  color: navy;
93
  padding: 0;
94
  margin: 0;
95
}
96
pre {
97
  white-space: pre-wrap;
98
}
99

100
#author {
101
  color: #527bbd;
102
  font-weight: bold;
103
  font-size: 1.1em;
104
}
105
#email {
106
}
107
#revnumber, #revdate, #revremark {
108
}
109

110
#footer {
111
  font-size: small;
112
  border-top: 2px solid silver;
113
  padding-top: 0.5em;
114
  margin-top: 4.0em;
115
}
116
#footer-text {
117
  float: left;
118
  padding-bottom: 0.5em;
119
}
120
#footer-badges {
121
  float: right;
122
  padding-bottom: 0.5em;
123
}
124

125
#preamble {
126
  margin-top: 1.5em;
127
  margin-bottom: 1.5em;
128
}
129
div.imageblock, div.exampleblock, div.verseblock,
130
div.quoteblock, div.literalblock, div.listingblock, div.sidebarblock,
131
div.admonitionblock {
132
  margin-top: 1.0em;
133
  margin-bottom: 1.5em;
134
}
135
div.admonitionblock {
136
  margin-top: 2.0em;
137
  margin-bottom: 2.0em;
138
  margin-right: 10%;
139
  color: #606060;
140
}
141

142
div.content { /* Block element content. */
143
  padding: 0;
144
}
145

146
/* Block element titles. */
147
div.title, caption.title {
148
  color: #527bbd;
149
  font-weight: bold;
150
  text-align: left;
151
  margin-top: 1.0em;
152
  margin-bottom: 0.5em;
153
}
154
div.title + * {
155
  margin-top: 0;
156
}
157

158
td div.title:first-child {
159
  margin-top: 0.0em;
160
}
161
div.content div.title:first-child {
162
  margin-top: 0.0em;
163
}
164
div.content + div.title {
165
  margin-top: 0.0em;
166
}
167

168
div.sidebarblock > div.content {
169
  background: #ffffee;
170
  border: 1px solid #dddddd;
171
  border-left: 4px solid #f0f0f0;
172
  padding: 0.5em;
173
}
174

175
div.listingblock > div.content {
176
  border: 1px solid #dddddd;
177
  border-left: 5px solid #f0f0f0;
178
  background: #f8f8f8;
179
  padding: 0.5em;
180
}
181

182
div.quoteblock, div.verseblock {
183
  padding-left: 1.0em;
184
  margin-left: 1.0em;
185
  margin-right: 10%;
186
  border-left: 5px solid #f0f0f0;
187
  color: #888;
188
}
189

190
div.quoteblock > div.attribution {
191
  padding-top: 0.5em;
192
  text-align: right;
193
}
194

195
div.verseblock > pre.content {
196
  font-family: inherit;
197
  font-size: inherit;
198
}
199
div.verseblock > div.attribution {
200
  padding-top: 0.75em;
201
  text-align: left;
202
}
203
/* DEPRECATED: Pre version 8.2.7 verse style literal block. */
204
div.verseblock + div.attribution {
205
  text-align: left;
206
}
207

208
div.admonitionblock .icon {
209
  vertical-align: top;
210
  font-size: 1.1em;
211
  font-weight: bold;
212
  text-decoration: underline;
213
  color: #527bbd;
214
  padding-right: 0.5em;
215
}
216
div.admonitionblock td.content {
217
  padding-left: 0.5em;
218
  border-left: 3px solid #dddddd;
219
}
220

221
div.exampleblock > div.content {
222
  border-left: 3px solid #dddddd;
223
  padding-left: 0.5em;
224
}
225

226
div.imageblock div.content { padding-left: 0; }
227
span.image img { border-style: none; vertical-align: text-bottom; }
228
a.image:visited { color: white; }
229

230
dl {
231
  margin-top: 0.8em;
232
  margin-bottom: 0.8em;
233
}
234
dt {
235
  margin-top: 0.5em;
236
  margin-bottom: 0;
237
  font-style: normal;
238
  color: navy;
239
}
240
dd > *:first-child {
241
  margin-top: 0.1em;
242
}
243

244
ul, ol {
245
    list-style-position: outside;
246
}
247
ol.arabic {
248
  list-style-type: decimal;
249
}
250
ol.loweralpha {
251
  list-style-type: lower-alpha;
252
}
253
ol.upperalpha {
254
  list-style-type: upper-alpha;
255
}
256
ol.lowerroman {
257
  list-style-type: lower-roman;
258
}
259
ol.upperroman {
260
  list-style-type: upper-roman;
261
}
262

263
div.compact ul, div.compact ol,
264
div.compact p, div.compact p,
265
div.compact div, div.compact div {
266
  margin-top: 0.1em;
267
  margin-bottom: 0.1em;
268
}
269

270
tfoot {
271
  font-weight: bold;
272
}
273
td > div.verse {
274
  white-space: pre;
275
}
276

277
div.hdlist {
278
  margin-top: 0.8em;
279
  margin-bottom: 0.8em;
280
}
281
div.hdlist tr {
282
  padding-bottom: 15px;
283
}
284
dt.hdlist1.strong, td.hdlist1.strong {
285
  font-weight: bold;
286
}
287
td.hdlist1 {
288
  vertical-align: top;
289
  font-style: normal;
290
  padding-right: 0.8em;
291
  color: navy;
292
}
293
td.hdlist2 {
294
  vertical-align: top;
295
}
296
div.hdlist.compact tr {
297
  margin: 0;
298
  padding-bottom: 0;
299
}
300

301
.comment {
302
  background: yellow;
303
}
304

305
.footnote, .footnoteref {
306
  font-size: 0.8em;
307
}
308

309
span.footnote, span.footnoteref {
310
  vertical-align: super;
311
}
312

313
#footnotes {
314
  margin: 20px 0 20px 0;
315
  padding: 7px 0 0 0;
316
}
317

318
#footnotes div.footnote {
319
  margin: 0 0 5px 0;
320
}
321

322
#footnotes hr {
323
  border: none;
324
  border-top: 1px solid silver;
325
  height: 1px;
326
  text-align: left;
327
  margin-left: 0;
328
  width: 20%;
329
  min-width: 100px;
330
}
331

332
div.colist td {
333
  padding-right: 0.5em;
334
  padding-bottom: 0.3em;
335
  vertical-align: top;
336
}
337
div.colist td img {
338
  margin-top: 0.3em;
339
}
340

341
@media print {
342
  #footer-badges { display: none; }
343
}
344

345
#toc {
346
  margin-bottom: 2.5em;
347
}
348

349
#toctitle {
350
  color: #527bbd;
351
  font-size: 1.1em;
352
  font-weight: bold;
353
  margin-top: 1.0em;
354
  margin-bottom: 0.1em;
355
}
356

357
div.toclevel0, div.toclevel1, div.toclevel2, div.toclevel3, div.toclevel4 {
358
  margin-top: 0;
359
  margin-bottom: 0;
360
}
361
div.toclevel2 {
362
  margin-left: 2em;
363
  font-size: 0.9em;
364
}
365
div.toclevel3 {
366
  margin-left: 4em;
367
  font-size: 0.9em;
368
}
369
div.toclevel4 {
370
  margin-left: 6em;
371
  font-size: 0.9em;
372
}
373

374
span.aqua { color: aqua; }
375
span.black { color: black; }
376
span.blue { color: blue; }
377
span.fuchsia { color: fuchsia; }
378
span.gray { color: gray; }
379
span.green { color: green; }
380
span.lime { color: lime; }
381
span.maroon { color: maroon; }
382
span.navy { color: navy; }
383
span.olive { color: olive; }
384
span.purple { color: purple; }
385
span.red { color: red; }
386
span.silver { color: silver; }
387
span.teal { color: teal; }
388
span.white { color: white; }
389
span.yellow { color: yellow; }
390

391
span.aqua-background { background: aqua; }
392
span.black-background { background: black; }
393
span.blue-background { background: blue; }
394
span.fuchsia-background { background: fuchsia; }
395
span.gray-background { background: gray; }
396
span.green-background { background: green; }
397
span.lime-background { background: lime; }
398
span.maroon-background { background: maroon; }
399
span.navy-background { background: navy; }
400
span.olive-background { background: olive; }
401
span.purple-background { background: purple; }
402
span.red-background { background: red; }
403
span.silver-background { background: silver; }
404
span.teal-background { background: teal; }
405
span.white-background { background: white; }
406
span.yellow-background { background: yellow; }
407

408
span.big { font-size: 2em; }
409
span.small { font-size: 0.6em; }
410

411
span.underline { text-decoration: underline; }
412
span.overline { text-decoration: overline; }
413
span.line-through { text-decoration: line-through; }
414

415
div.unbreakable { page-break-inside: avoid; }
416

417

418
/*
419
 * xhtml11 specific
420
 *
421
 * */
422

423
div.tableblock {
424
  margin-top: 1.0em;
425
  margin-bottom: 1.5em;
426
}
427
div.tableblock > table {
428
  border: 3px solid #527bbd;
429
}
430
thead, p.table.header {
431
  font-weight: bold;
432
  color: #527bbd;
433
}
434
p.table {
435
  margin-top: 0;
436
}
437
/* Because the table frame attribute is overriden by CSS in most browsers. */
438
div.tableblock > table[frame="void"] {
439
  border-style: none;
440
}
441
div.tableblock > table[frame="hsides"] {
442
  border-left-style: none;
443
  border-right-style: none;
444
}
445
div.tableblock > table[frame="vsides"] {
446
  border-top-style: none;
447
  border-bottom-style: none;
448
}
449

450

451
/*
452
 * html5 specific
453
 *
454
 * */
455

456
table.tableblock {
457
  margin-top: 1.0em;
458
  margin-bottom: 1.5em;
459
}
460
thead, p.tableblock.header {
461
  font-weight: bold;
462
  color: #527bbd;
463
}
464
p.tableblock {
465
  margin-top: 0;
466
}
467
table.tableblock {
468
  border-width: 3px;
469
  border-spacing: 0px;
470
  border-style: solid;
471
  border-color: #527bbd;
472
  border-collapse: collapse;
473
}
474
th.tableblock, td.tableblock {
475
  border-width: 1px;
476
  padding: 4px;
477
  border-style: solid;
478
  border-color: #527bbd;
479
}
480

481
table.tableblock.frame-topbot {
482
  border-left-style: hidden;
483
  border-right-style: hidden;
484
}
485
table.tableblock.frame-sides {
486
  border-top-style: hidden;
487
  border-bottom-style: hidden;
488
}
489
table.tableblock.frame-none {
490
  border-style: hidden;
491
}
492

493
th.tableblock.halign-left, td.tableblock.halign-left {
494
  text-align: left;
495
}
496
th.tableblock.halign-center, td.tableblock.halign-center {
497
  text-align: center;
498
}
499
th.tableblock.halign-right, td.tableblock.halign-right {
500
  text-align: right;
501
}
502

503
th.tableblock.valign-top, td.tableblock.valign-top {
504
  vertical-align: top;
505
}
506
th.tableblock.valign-middle, td.tableblock.valign-middle {
507
  vertical-align: middle;
508
}
509
th.tableblock.valign-bottom, td.tableblock.valign-bottom {
510
  vertical-align: bottom;
511
}
512

513

514
/*
515
 * manpage specific
516
 *
517
 * */
518

519
body.manpage h1 {
520
  padding-top: 0.5em;
521
  padding-bottom: 0.5em;
522
  border-top: 2px solid silver;
523
  border-bottom: 2px solid silver;
524
}
525
body.manpage h2 {
526
  border-style: none;
527
}
528
body.manpage div.sectionbody {
529
  margin-left: 3em;
530
}
531

532
@media print {
533
  body.manpage div#toc { display: none; }
534
}
535

    
536

    
537
</style>
538
<script type="text/javascript">
539
/*<![CDATA[*/
540
var asciidoc = {  // Namespace.
541

542
/////////////////////////////////////////////////////////////////////
543
// Table Of Contents generator
544
/////////////////////////////////////////////////////////////////////
545

546
/* Author: Mihai Bazon, September 2002
547
 * http://students.infoiasi.ro/~mishoo
548
 *
549
 * Table Of Content generator
550
 * Version: 0.4
551
 *
552
 * Feel free to use this script under the terms of the GNU General Public
553
 * License, as long as you do not remove or alter this notice.
554
 */
555

556
 /* modified by Troy D. Hanson, September 2006. License: GPL */
557
 /* modified by Stuart Rackham, 2006, 2009. License: GPL */
558

559
// toclevels = 1..4.
560
toc: function (toclevels) {
561

562
  function getText(el) {
563
    var text = "";
564
    for (var i = el.firstChild; i != null; i = i.nextSibling) {
565
      if (i.nodeType == 3 /* Node.TEXT_NODE */) // IE doesn't speak constants.
566
        text += i.data;
567
      else if (i.firstChild != null)
568
        text += getText(i);
569
    }
570
    return text;
571
  }
572

573
  function TocEntry(el, text, toclevel) {
574
    this.element = el;
575
    this.text = text;
576
    this.toclevel = toclevel;
577
  }
578

579
  function tocEntries(el, toclevels) {
580
    var result = new Array;
581
    var re = new RegExp('[hH]([1-'+(toclevels+1)+'])');
582
    // Function that scans the DOM tree for header elements (the DOM2
583
    // nodeIterator API would be a better technique but not supported by all
584
    // browsers).
585
    var iterate = function (el) {
586
      for (var i = el.firstChild; i != null; i = i.nextSibling) {
587
        if (i.nodeType == 1 /* Node.ELEMENT_NODE */) {
588
          var mo = re.exec(i.tagName);
589
          if (mo && (i.getAttribute("class") || i.getAttribute("className")) != "float") {
590
            result[result.length] = new TocEntry(i, getText(i), mo[1]-1);
591
          }
592
          iterate(i);
593
        }
594
      }
595
    }
596
    iterate(el);
597
    return result;
598
  }
599

600
  var toc = document.getElementById("toc");
601
  if (!toc) {
602
    return;
603
  }
604

605
  // Delete existing TOC entries in case we're reloading the TOC.
606
  var tocEntriesToRemove = [];
607
  var i;
608
  for (i = 0; i < toc.childNodes.length; i++) {
609
    var entry = toc.childNodes[i];
610
    if (entry.nodeName.toLowerCase() == 'div'
611
     && entry.getAttribute("class")
612
     && entry.getAttribute("class").match(/^toclevel/))
613
      tocEntriesToRemove.push(entry);
614
  }
615
  for (i = 0; i < tocEntriesToRemove.length; i++) {
616
    toc.removeChild(tocEntriesToRemove[i]);
617
  }
618

619
  // Rebuild TOC entries.
620
  var entries = tocEntries(document.getElementById("content"), toclevels);
621
  for (var i = 0; i < entries.length; ++i) {
622
    var entry = entries[i];
623
    if (entry.element.id == "")
624
      entry.element.id = "_toc_" + i;
625
    var a = document.createElement("a");
626
    a.href = "#" + entry.element.id;
627
    a.appendChild(document.createTextNode(entry.text));
628
    var div = document.createElement("div");
629
    div.appendChild(a);
630
    div.className = "toclevel" + entry.toclevel;
631
    toc.appendChild(div);
632
  }
633
  if (entries.length == 0)
634
    toc.parentNode.removeChild(toc);
635
},
636

637

638
/////////////////////////////////////////////////////////////////////
639
// Footnotes generator
640
/////////////////////////////////////////////////////////////////////
641

642
/* Based on footnote generation code from:
643
 * http://www.brandspankingnew.net/archive/2005/07/format_footnote.html
644
 */
645

646
footnotes: function () {
647
  // Delete existing footnote entries in case we're reloading the footnodes.
648
  var i;
649
  var noteholder = document.getElementById("footnotes");
650
  if (!noteholder) {
651
    return;
652
  }
653
  var entriesToRemove = [];
654
  for (i = 0; i < noteholder.childNodes.length; i++) {
655
    var entry = noteholder.childNodes[i];
656
    if (entry.nodeName.toLowerCase() == 'div' && entry.getAttribute("class") == "footnote")
657
      entriesToRemove.push(entry);
658
  }
659
  for (i = 0; i < entriesToRemove.length; i++) {
660
    noteholder.removeChild(entriesToRemove[i]);
661
  }
662

663
  // Rebuild footnote entries.
664
  var cont = document.getElementById("content");
665
  var spans = cont.getElementsByTagName("span");
666
  var refs = {};
667
  var n = 0;
668
  for (i=0; i<spans.length; i++) {
669
    if (spans[i].className == "footnote") {
670
      n++;
671
      var note = spans[i].getAttribute("data-note");
672
      if (!note) {
673
        // Use [\s\S] in place of . so multi-line matches work.
674
        // Because JavaScript has no s (dotall) regex flag.
675
        note = spans[i].innerHTML.match(/\s*\[([\s\S]*)]\s*/)[1];
676
        spans[i].innerHTML =
677
          "[<a id='_footnoteref_" + n + "' href='#_footnote_" + n +
678
          "' title='View footnote' class='footnote'>" + n + "</a>]";
679
        spans[i].setAttribute("data-note", note);
680
      }
681
      noteholder.innerHTML +=
682
        "<div class='footnote' id='_footnote_" + n + "'>" +
683
        "<a href='#_footnoteref_" + n + "' title='Return to text'>" +
684
        n + "</a>. " + note + "</div>";
685
      var id =spans[i].getAttribute("id");
686
      if (id != null) refs["#"+id] = n;
687
    }
688
  }
689
  if (n == 0)
690
    noteholder.parentNode.removeChild(noteholder);
691
  else {
692
    // Process footnoterefs.
693
    for (i=0; i<spans.length; i++) {
694
      if (spans[i].className == "footnoteref") {
695
        var href = spans[i].getElementsByTagName("a")[0].getAttribute("href");
696
        href = href.match(/#.*/)[0];  // Because IE return full URL.
697
        n = refs[href];
698
        spans[i].innerHTML =
699
          "[<a href='#_footnote_" + n +
700
          "' title='View footnote' class='footnote'>" + n + "</a>]";
701
      }
702
    }
703
  }
704
},
705

706
install: function(toclevels) {
707
  var timerId;
708

709
  function reinstall() {
710
    asciidoc.footnotes();
711
    if (toclevels) {
712
      asciidoc.toc(toclevels);
713
    }
714
  }
715

716
  function reinstallAndRemoveTimer() {
717
    clearInterval(timerId);
718
    reinstall();
719
  }
720

721
  timerId = setInterval(reinstall, 500);
722
  if (document.addEventListener)
723
    document.addEventListener("DOMContentLoaded", reinstallAndRemoveTimer, false);
724
  else
725
    window.onload = reinstallAndRemoveTimer;
726
}
727

728
}
729
asciidoc.install();
730
/*]]>*/
731
</script>
732
</head>
733
<body class="article">
734
<div id="header">
735
<h1>UNICHARAMBIGS(5)</h1>
736
</div>
737
<div id="content">
738
<div class="sect1">
739
<h2 id="_name">NAME</h2>
740
<div class="sectionbody">
741
<div class="paragraph"><p>unicharambigs - Tesseract unicharset ambiguities</p></div>
742
</div>
743
</div>
744
<div class="sect1">
745
<h2 id="_description">DESCRIPTION</h2>
746
<div class="sectionbody">
747
<div class="paragraph"><p>The unicharambigs file (a component of traineddata, see combine_tessdata(1) )
748
is used by Tesseract to represent possible ambiguities between characters,
749
or groups of characters.</p></div>
750
<div class="paragraph"><p>The file contains a number of lines, laid out as follow:</p></div>
751
<div class="literalblock">
752
<div class="content monospaced">
753
<pre>[num] &lt;TAB&gt; [char(s)] &lt;TAB&gt; [num] &lt;TAB&gt; [char(s)] &lt;TAB&gt; [num]</pre>
754
</div></div>
755
<div class="hdlist"><table>
756
<tr>
757
<td class="hdlist1">
758
Field one
759
<br>
760
</td>
761
<td class="hdlist2">
762
<p style="margin-top: 0;">
763
the number of characters contained in field two
764
</p>
765
</td>
766
</tr>
767
<tr>
768
<td class="hdlist1">
769
Field two
770
<br>
771
</td>
772
<td class="hdlist2">
773
<p style="margin-top: 0;">
774
the character sequence to be replaced
775
</p>
776
</td>
777
</tr>
778
<tr>
779
<td class="hdlist1">
780
Field three
781
<br>
782
</td>
783
<td class="hdlist2">
784
<p style="margin-top: 0;">
785
the number of characters contained in field four
786
</p>
787
</td>
788
</tr>
789
<tr>
790
<td class="hdlist1">
791
Field four
792
<br>
793
</td>
794
<td class="hdlist2">
795
<p style="margin-top: 0;">
796
the character sequence used to replace field two
797
</p>
798
</td>
799
</tr>
800
<tr>
801
<td class="hdlist1">
802
Field five
803
<br>
804
</td>
805
<td class="hdlist2">
806
<p style="margin-top: 0;">
807
contains either 1 or 0. 1 denotes a mandatory
808
replacement, 0 denotes an optional replacement.
809
</p>
810
</td>
811
</tr>
812
</table></div>
813
<div class="paragraph"><p>Characters appearing in fields two and four should appear in
814
unicharset. The numbers in fields one and three refer to the
815
number of unichars (not bytes).</p></div>
816
</div>
817
</div>
818
<div class="sect1">
819
<h2 id="_example">EXAMPLE</h2>
820
<div class="sectionbody">
821
<div class="literalblock">
822
<div class="content monospaced">
823
<pre>v1
824
2       ' '     1       "     1
825
1       m       2       r n   0
826
3       i i i   1       m     0</pre>
827
</div></div>
828
<div class="paragraph"><p>The first line is a version identifier.
829
In this example, all instances of the <em>2</em> character sequence <em>'</em>' will
830
<strong>always</strong> be replaced by the <em>1</em> character sequence <em>"</em>; a <em>1</em> character
831
sequence <em>m</em> <strong>may</strong> be replaced by the <em>2</em> character sequence <em>rn</em>, and
832
the <em>3</em> character sequence <strong>may</strong> be replaced by the <em>1</em> character
833
sequence <em>m</em>.</p></div>
834
<div class="paragraph"><p>Version 3.03 and on supports a new, simpler format for the unicharambigs
835
file:</p></div>
836
<div class="literalblock">
837
<div class="content monospaced">
838
<pre>v2
839
'' " 1
840
m rn 0
841
iii m 0</pre>
842
</div></div>
843
<div class="paragraph"><p>In this format, the "error" and "correction" are simple UTF-8 strings
844
separated by a space, and, after another space, the same type specifier
845
as v1 (0 for optional and 1 for mandatory substitution). Note the downside
846
of this simpler format is that Tesseract has to encode the UTF-8 strings
847
into the components of the unicharset. In complex scripts, this encoding
848
may be ambiguous. In this case, the encoding is chosen such as to use the
849
least UTF-8 characters for each component, ie the shortest unicharset
850
components will make up the encoding.</p></div>
851
</div>
852
</div>
853
<div class="sect1">
854
<h2 id="_history">HISTORY</h2>
855
<div class="sectionbody">
856
<div class="paragraph"><p>The unicharambigs file first appeared in Tesseract 3.00; prior to that, a
857
similar format, called DangAmbigs (<em>dangerous ambiguities</em>) was used: the
858
format was almost identical, except only mandatory replacements could be
859
specified, and field 5 was absent.</p></div>
860
</div>
861
</div>
862
<div class="sect1">
863
<h2 id="_bugs">BUGS</h2>
864
<div class="sectionbody">
865
<div class="paragraph"><p>This is a documentation "bug": it&#8217;s not currently clear what should be done
866
in the case of ligatures (such as <em>fi</em>) which may also appear as regular
867
letters in the unicharset.</p></div>
868
</div>
869
</div>
870
<div class="sect1">
871
<h2 id="_see_also">SEE ALSO</h2>
872
<div class="sectionbody">
873
<div class="paragraph"><p>tesseract(1), unicharset(5)
874
<a href="https://tesseract-ocr.github.io/tessdoc/Training-Tesseract-3.03%E2%80%933.05.html#the-unicharambigs-file">https://tesseract-ocr.github.io/tessdoc/Training-Tesseract-3.03%E2%80%933.05.html#the-unicharambigs-file</a></p></div>
875
</div>
876
</div>
877
<div class="sect1">
878
<h2 id="_author">AUTHOR</h2>
879
<div class="sectionbody">
880
<div class="paragraph"><p>The Tesseract OCR engine was written by Ray Smith and his research groups
881
at Hewlett Packard (1985-1995) and Google (2006-present).</p></div>
882
</div>
883
</div>
884
</div>
885
<div id="footnotes"><hr></div>
886
<div id="footer">
887
<div id="footer-text">
888
Last updated
889
 2020-02-06 21:45:54 CET
890
</div>
891
</div>
892
</body>
893
</html>
클립보드 이미지 추가 (최대 크기: 500 MB)