DaemonForums  

Go Back   DaemonForums > OpenBSD > OpenBSD Packages and Ports

OpenBSD Packages and Ports Installation and upgrading of packages and ports on OpenBSD.

Reply
 
Thread Tools Display Modes
  #1   (View Single Post)  
Old 2nd June 2020
PapaParrot's Avatar
PapaParrot PapaParrot is offline
parrot
 
Join Date: Jul 2015
Location: Durango, Mx.
Posts: 461
Default Package or program to convert Msword .doc files

Besides "libre Office", does any one know if there are other packages /software for OpenBsd that can read, and convert MSword documents/files to a normal readable text format ? EG:
my_example_file.DOC, convert to normal .txt
Thank you
__________________
My best friends are parrots

Last edited by PapaParrot; 2nd June 2020 at 05:59 AM.
Reply With Quote
  #2   (View Single Post)  
Old 2nd June 2020
victorvas's Avatar
victorvas victorvas is offline
Real Name: Victor
Fdisk Soldier
 
Join Date: May 2019
Posts: 68
Default

On OpenBSD - packages:
catdoc
docx2txt

On FreeBSD and Linux the best program to do it is:
pandoc
Reply With Quote
  #3   (View Single Post)  
Old 2nd June 2020
IdOp's Avatar
IdOp IdOp is offline
Too dumb for a smartphone
 
Join Date: May 2008
Location: twisting on the daemon's fork(2)
Posts: 917
Default

Thank you both, it's a question I looked at before without much success. It looks like docx2txt could be helpful for my occasional usage.
Reply With Quote
  #4   (View Single Post)  
Old 2nd June 2020
CiotBSD CiotBSD is offline
c107:b5d::
 
Join Date: Jun 2019
Location: Under /
Posts: 108
Default

Since LibreOffice installed on my station, i use too convert into console:

Code:
$ libreoffice --headless --convert-to odt *.docx
or:

Code:
$ libreoffice --headless --convert-to pdf file
and for text:

Code:
$ libreoffice --headless --convert-to text file
;-)

@victorvas: thanks for your tips!
__________________
GPG:Fingerprint ed25519 : 072A 4DA2 8AFD 868D 74CF 9EA2 B85E 9ADA C377 5E8E
GPG:Fingerprint rsa4096 : 4E0D 4AF7 77F5 0FAE A35D 5B62 D0FF 7361 59BF 1733
Reply With Quote
  #5   (View Single Post)  
Old 2nd June 2020
Beastie Beastie is offline
Daemonology student
 
Join Date: Jan 2009
Location: /dev/earth0
Posts: 334
Default

Quote:
Originally Posted by PapaParrot View Post
convert MSword documents/files to a normal readable text format
The one, the only textproc/antiword, of course.
__________________
May the source be with you!
Reply With Quote
  #6   (View Single Post)  
Old 2nd June 2020
PapaParrot's Avatar
PapaParrot PapaParrot is offline
parrot
 
Join Date: Jul 2015
Location: Durango, Mx.
Posts: 461
Default

Thank you very much, "catdoc" works perfectly for my needs, this is very very rare I even need to view/read this type of document. Just tried catdoc, and it is perfect. Thank you alos for the other responses as well,..
====edit====
Thanks, I will look at those as well, but like I said the"catdoc" seems to be fine.
__________________
My best friends are parrots

Last edited by PapaParrot; 2nd June 2020 at 07:26 PM. Reason: posted almost the same time I did,
Reply With Quote
  #7   (View Single Post)  
Old 2nd June 2020
IdOp's Avatar
IdOp IdOp is offline
Too dumb for a smartphone
 
Join Date: May 2008
Location: twisting on the daemon's fork(2)
Posts: 917
Default

Quote:
Originally Posted by PapaParrot View Post
... "catdoc" seems to be fine.
I was able to compile and run catdoc, but it would not work on the .docx file I tried it on because the file was of a 2007+ docx version. I suspect this means it could fail on many real-world .docx files these days? So that is a limitation to have in the back of one's mind at least if you should run into it. The docx2txt script did work on my example file.
Reply With Quote
  #8   (View Single Post)  
Old 3rd June 2020
PapaParrot's Avatar
PapaParrot PapaParrot is offline
parrot
 
Join Date: Jul 2015
Location: Durango, Mx.
Posts: 461
Default

Yea I ran into that myself, so I tried the "docx2txt", and it was able to convert some, but still there were a couple that failed, so I tried the "antiword", it was able to convert a few more. There was still 2 or 3 , that none of these could convert, "antiword" said they were not "Word" files, in any event I found some online converter site that was able to convert the remaining files. Fortunately this is something I do not need to do often. It does not make sense to me that MSwindows can not produce files that are universal and can be read, edited ,etc...with a general text editor. There is no reason any one should be forced to use MS windows, and Word, if and when some secretary sends them documents, but that would be a whole other topic,... any way,I did convert all of them and now have them in text format, I mean .txt.
Quote:
I suspect this means it could fail on many real-world .docx files these days?
,
These files were pretty old, about 5 years, don't know if that is " these days" or not. What seems strange to me is Why some docx files were convertible, and others no,seems like there is a consistancy problem with the mal-ware used to produce these type of files. Another topic as well
__________________
My best friends are parrots
Reply With Quote
  #9   (View Single Post)  
Old 3rd June 2020
IdOp's Avatar
IdOp IdOp is offline
Too dumb for a smartphone
 
Join Date: May 2008
Location: twisting on the daemon's fork(2)
Posts: 917
Default

Well 5 years old is still well past 2007, so it seems likely they would be unconvertible by catdoc. But it is also possible that files today could sometimes/often(?) be produced to the pre-2007 standard. Don't know how common that would be.

I'm lucky I don't run into them often. There is one friend of the family that once a year sends a letter about what they did in the past year. It always comes in .docx format. I always politely ask if they could send a .pdf, and they send that and I can read it. Next year, same thing again. I think for typical Windows users there just isn't much mindspace for anything outside their ecosystem, understandably. So we must adapt, and if we can avoid being absorbed it's a victory.
Reply With Quote
Old 3rd June 2020
Beastie Beastie is offline
Daemonology student
 
Join Date: Jan 2009
Location: /dev/earth0
Posts: 334
Default

Don't forget that .docx files are just "Open XML" files, in other words zipped XML files. So, if all else fails, you can always $ tar xf file.docx and parse the XML files. The main contents are stored in a file called document.xml IIRC. Of course, XML is an abomination, but I did say "if all else fails".
__________________
May the source be with you!
Reply With Quote
Old 4th June 2020
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 7,034
Default

Just an FYI, Beastie, but tar(1) won't unzip. The archivers/unzip package is needed.
Reply With Quote
Old 4th June 2020
Beastie Beastie is offline
Daemonology student
 
Join Date: Jan 2009
Location: /dev/earth0
Posts: 334
Default

Ah my bad. It supports zip files on FreeBSD and I assumed it did on OpenBSD too.
__________________
May the source be with you!
Reply With Quote
Old 4th June 2020
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 7,034
Default

Nope, nor does GNU Tar (archivers/gtar).
Reply With Quote
Old 5th June 2020
PapaParrot's Avatar
PapaParrot PapaParrot is offline
parrot
 
Join Date: Jul 2015
Location: Durango, Mx.
Posts: 461
Default

Thanks jggimi , I do have
Quote:
The archivers/unzip package is needed.
the archivers /unzip package. Thanks Beastie , as well, I see what you mean , using the archive/unzip,...pretty hard to read it though, but "if all else fails", well there you go,..
__________________
My best friends are parrots
Reply With Quote
Old 5th June 2020
IdOp's Avatar
IdOp IdOp is offline
Too dumb for a smartphone
 
Join Date: May 2008
Location: twisting on the daemon's fork(2)
Posts: 917
Default

You might be able to get GNU tar to support zip using the -I (capital 'eye', or --use-compress-program) option to call a shell script designed to call zip or unzip as necessary, but I'm not sure I'd really want to go there, lol.

Last edited by IdOp; 5th June 2020 at 03:28 AM.
Reply With Quote
Old 5th June 2020
Beastie Beastie is offline
Daemonology student
 
Join Date: Jan 2009
Location: /dev/earth0
Posts: 334
Default

Quote:
Originally Posted by PapaParrot View Post
pretty hard to read it though, but "if all else fails", well there you go,..
Okay, how do you like Python?
Code:
#!/usr/local/bin/python

import os, re

with open('document.xml') as f1:
  text = re.sub('<[^<]+w:p ?>', '\n', f1.read())
  text = re.sub('<[^<]+>', '', text)
  with open('output.txt', 'w') as f2:
    f2.write(text)
I know it's ugly. Don't sue me.
__________________
May the source be with you!
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
New Convert NFSv3 NFSv4 bigearsbilly OpenBSD General 1 6th March 2020 12:37 PM
sh script to convert inches to mm and cm J65nko Programming 6 8th August 2019 11:33 PM
Package lists/Config files for Lightweight Desktop/Funding Option shep Feedback and Suggestions 1 16th December 2013 07:24 PM
Problem with convert aleunix OpenBSD Packages and Ports 2 10th May 2012 01:52 PM
Sizes of Package files, All & the rest jaymax FreeBSD Ports and Packages 3 16th July 2008 08:36 PM


All times are GMT. The time now is 01:23 PM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Content copyright © 2007-2010, the authors
Daemon image copyright ©1988, Marshall Kirk McKusick