Home Job Postings Python Script to Auto-Sort 3TB of Documents (PDF, DOCX, TXT, HWP) Based on Korea This topic has [1] reply, 0 voices, and was last updated 2 hours ago by .. Now Editing “Python Script to Auto-Sort 3TB of Documents (PDF, DOCX, TXT, HWP) Based on Korea” Name * Password * Email Topic Title (Maximum Length 80) Company * Location Expires at Description: Hello, I am looking for an experienced Python developer to create a file-sorting script. I have a massive archive (approx. 3TB) of research documents stored on an external hard drive. I need a script that reads the internal text content of these files and automatically copies them into 6 specific folders based on predefined Korean keywords. [Key Requirements] Target File Types: .txt, .docx, .pdf, and .hwp (Korean Word Processor). The script must be able to extract text from these formats. Support for Korean encoding (UTF-8, EUC-KR) is mandatory. Content-Based Sorting: The script must read the body text of the files, not just the file names. Keyword Mapping: I have 6 categories, each with a specific list of Korean keywords. (e.g., Category 1 keywords: "음양오행", "조후", etc.) Action: The script should COPY (not move/delete) the matching files to a designated "Sorted" folder structure. Cross-Referencing: If a single document contains keywords belonging to both Category 1 and Category 4, the file must be copied into both folders. Error Handling & Performance: Since the volume is 3TB, the script must be robust. If a file is corrupted or unreadable, the script should simply skip it, log the error, and continue running without crashing. [Deliverables] No-Code Setup: I am not a programmer. The final deliverable MUST be a standalone Windows executable (.exe) OR a very simple one-click batch file (.bat). I cannot set up complex coding environments. Config File: Please include a simple configuration file (like config.txt or keywords.json) so I can easily update or add new keywords later without modifying the code. Clear Instructions: A brief, step-by-step English manual on how to run it. [Skills Required] Python Text Extraction (PyPDF2, python-docx, olefile/pyhwp for .hwp) File & Data Management Experience with CJK (Korean) text encoding is a big plus. If you understand these requirements, please start your proposal with the word "Sorted" so I know you read the description. Please provide your estimated time of delivery and a brief explanation of how you plan to handle the .hwp and .pdf text extraction. I agree to the terms of service Update List