Ein OCR Modell auf meine Schrift trainieren
Englisches Modell
Für das Englische Modell habe ich bisher eine meiner Einglisch Klausuren in etwa 220 Teile unterteilt, welche vorerst verwendet werden um das Modell zu trainieren.
Um es zu testen, kann tesseract /mnt/d/Programming/tesstrain/data/eng_Erik-ground-truth/eva1_slice1.tif stdout --tessdata-dir /mnt/d/Programming/tesstrain/data -l eng_Erik --psm 7
ausgeführt werden.
Um zu trainieren in /mnt/d/Programming/tesstrain
sein und make training MODEL_NAME=eng_Erik MAX_ITERATIONS=300000
ausführen
Timeline (in 10.000er Schritten)
Satz: The term postcolonialism means firstly the historic time and secondly
- whes term posterlanialine man foriotly the intinnc bimm oad ay eeentle
- Whwu tars postaolonallinm means fiestly the histinc thme and secorphld
- Tiy tears posifaolioicalimm meams fiexstl the titisic tme and segorall
- Thr term postficaolonialism meams fisitly the histac time and secoroll.
- Tie term postoolonialism yeans fixistly the istiic time and sucarile,
- TBVhe term postoclnalisms means fisstly the histisic thrme and segorsle
- The ters postcoloniabiss wmeass frstty the hetoric tirse and secenedh
- The temm posicolonialiasm means fistlty the tistomic tirme and seconsle
- The tem postoeclonialiasm means fiat the iistoric time and seconoll
- The term postcolonialism means frsstl the histosic time and sucoreole
- The tems postoclonialism means fixstly the histosic thime and secorole
- Bhe tm postaclonialisms means fisstt the tistom time and sweormls,
- The therm postaclonialisrm means firty the thistoric thrme and secrowolld
- The term postcolonialisms means fiisty ithe historic thame and secenol
- Tdhe nsm postcolonialisms means firsttky the histric ptire and secoenoll
- Te temm postcolonialism means firstly the tistosc time and secoreldl
- The tamm postaclonialisms means firtty the histosic tim and secoroll
- The tem postcclonialisms yeans firstty the historic thre and seconrold
- The term pGostooclonialisms means fis iy the ristoric tire and ecrald
- The term postoclonialism meass firsly the historic the and seceraled
- The tesm postoclonialismm means firrtly the tistoric thime and seconcle
- The temm postaclonialism means fisrttly the tistoric time and seconcle
- TVhe term postcolonialisms means fxtly the historic time and secorcole
- The term postcolonialims means frmesly the historic time and secorole
- The ters postcolonialims means fisssly the tistoric time and seconole
- Thie term postcolonialisss means ficstly the historic time and secanole
- The term postcolomnialims vyeans firisttly the tistoric tirme and seconol
- he ters postcclonialisss meams fisstly the tistoric time and seconcle
- Tie term postoclonialisms means fiissly the historic time and secondl
- The trrm postcoclonialisss means firstly the tietoric time and secondl
Und hier das normale Tesseract eng Ergebniss: Che Jom pethecheiwbly) on Aas, poe Nefevi. Jin, cath vec.
Nützliche Python scripts
Ein Datei Umbenenner
import os
import sys
if __name__ == '__main__':
if len(sys.argv) not in [4, 3]:
print('Usage: python file-renamer.py <search_string> <replace_string> [<directory>]')
1)
sys.exit(
= sys.argv[1]
search_string = sys.argv[2]
replace_string = sys.argv[3] if len(sys.argv) == 4 else os.getcwd()
directory
= 0
count for filename in os.listdir(directory):
= os.path.join(directory, filename)
file_path if os.path.isfile(file_path):
= filename.replace(search_string, replace_string)
new_filename = os.path.join(directory, new_filename)
new_file_path if filename != new_filename:
os.rename(file_path, new_file_path)+= 1
count
print('Renamed {} files.'.format(count))
Dateien ohne File-Extension mit .gt.txt
versehen
import os
import sys
def rename_files_in_directory(directory):
"""
Renames files in a given directory by appending '.gt.txt' to their names.
Args:
directory (str): The path to the directory containing the files.
Returns:
None
"""
= directory
current_directory = os.listdir(current_directory)
file_list
for file_name in file_list:
if not 'gt' in file_name and (file_name.endswith('.txt') or '.' not in file_name):
if '.' not in file_name:
= file_name + '.gt.txt'
new_name else:
= file_name.replace('.txt', '.gt.txt')
new_name
os.rename(os.path.join(current_directory, file_name),
os.path.join(current_directory, new_name))
if __name__ == "__main__":
if len(sys.argv) > 1:
= sys.argv[1]
directory
rename_files_in_directory(directory)else:
rename_files_in_directory(os.getcwd())