doc to txt conversion

classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

doc to txt conversion

Felix Miata
Which app do we have to strip the glop from a M$ .doc file and output
just the content to a plain text file? OO print to file doesn't seem to
understand anything but postscript. Do I need to "install" a "text printer"?

TIA
--
"Let your conversation be always full of grace." Colossians 4:6 NIV

 Team OS/2 ** Reg. Linux User #211409

Felix Miata  ***  http://mrmazda.no-ip.com/
--
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: doc to txt conversion

Bugzilla from leen.meyer@home.nl
On Saturday 23 December 2006 21:00, Felix Miata wrote:

> Which app do we have to strip the glop from a M$ .doc file and
> output just the content to a plain text file?

Did you try opening the .doc with OO, and "Save As", choosing "Text"
or "Text Encoded"?

Cheers,

Leen
--
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: doc to txt conversion

Felix Miata
On 2006/12/23 21:13 (GMT+0100) Leendert Meyer apparently typed:

> On Saturday 23 December 2006 21:00, Felix Miata wrote:

>> Which app do we have to strip the glop from a M$ .doc file and
>> output just the content to a plain text file?

> Did you try opening the .doc with OO, and "Save As", choosing "Text"
> or "Text Encoded"?

From a 790k .doc file text gets me a 34 byte file and text encoded gets
me a 62 byte file.
--
"Let your conversation be always full of grace." Colossians 4:6 NIV

 Team OS/2 ** Reg. Linux User #211409

Felix Miata  ***  http://mrmazda.no-ip.com/
--
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: doc to txt conversion

Bugzilla from leen.meyer@home.nl
On Saturday 23 December 2006 21:22, Felix Miata wrote:

> On 2006/12/23 21:13 (GMT+0100) Leendert Meyer apparently typed:
> > On Saturday 23 December 2006 21:00, Felix Miata wrote:
> >> Which app do we have to strip the glop from a M$ .doc file and
> >> output just the content to a plain text file?
> >
> > Did you try opening the .doc with OO, and "Save As", choosing
> > "Text" or "Text Encoded"?
>
> From a 790k .doc file text gets me a 34 byte file and text encoded
> gets me a 62 byte file.

Woa! A compression ratio of roughly 10k:1. But I'll assume you take
that as a loss ratio and declared it a failure.

How about an indirect conversion, like to .html? It would not be that
difficult to strip the tags. Maybe only special formatting (tables,
lists) would need some care... I hope there's no absolute
positioning.

BTW, what do you mean with 'glop'? Something aking to 'goo'?

Cheers,

Leen
--
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: doc to txt conversion

Felix Miata
On 2006/12/23 21:43 (GMT+0100) Leendert Meyer apparently typed:

> On Saturday 23 December 2006 21:22, Felix Miata wrote:

>> On 2006/12/23 21:13 (GMT+0100) Leendert Meyer apparently typed:

>> > On Saturday 23 December 2006 21:00, Felix Miata wrote:

>> >> Which app do we have to strip the glop from a M$ .doc file and
>> >> output just the content to a plain text file?

>> > Did you try opening the .doc with OO, and "Save As", choosing
>> > "Text" or "Text Encoded"?

>> From a 790k .doc file text gets me a 34 byte file and text encoded
>> gets me a 62 byte file.

> Woa! A compression ratio of roughly 10k:1. But I'll assume you take
> that as a loss ratio and declared it a failure.

> How about an indirect conversion, like to .html? It would not be that
> difficult to strip the tags.

Looks plenty difficult to me. The file size increased by about 40%. Both
SeaMonkey and Konq fail to display anything legible when opening the
result from disk. If I try to open the result in OO after closing it, It
paints the first page, then nothing else other than pegging the CPU.

> BTW, what do you mean with 'glop'? Something aking to 'goo'?

Everything except the content, roughly 80% of the .doc file, 90%+ of the
html file.
--
"Let your conversation be always full of grace." Colossians 4:6 NIV

 Team OS/2 ** Reg. Linux User #211409

Felix Miata  ***  http://mrmazda.no-ip.com/
--
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: doc to txt conversion

Mike Noble
In reply to this post by Felix Miata
On Saturday 23 December 2006 12:00, Felix Miata wrote:

> Which app do we have to strip the glop from a M$ .doc file and output
> just the content to a plain text file? OO print to file doesn't seem to
> understand anything but postscript. Do I need to "install" a "text
> printer"?
>
> TIA
> --
> "Let your conversation be always full of grace." Colossians 4:6 NIV
>
>  Team OS/2 ** Reg. Linux User #211409
>
> Felix Miata  ***  http://mrmazda.no-ip.com/

dos2unix

Mike
--
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: doc to txt conversion

Mike Noble
On Saturday 23 December 2006 13:04, Mike Noble wrote:

> On Saturday 23 December 2006 12:00, Felix Miata wrote:
> > Which app do we have to strip the glop from a M$ .doc file and output
> > just the content to a plain text file? OO print to file doesn't seem to
> > understand anything but postscript. Do I need to "install" a "text
> > printer"?
> >
> > TIA
> > --
> > "Let your conversation be always full of grace." Colossians 4:6 NIV
> >
> >  Team OS/2 ** Reg. Linux User #211409
> >
> > Felix Miata  ***  http://mrmazda.no-ip.com/
>
> dos2unix
>
> Mike
Ignore my message, replied to soon without really reading fully  :-)

Mike
--
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: doc to txt conversion

Bugzilla from leen.meyer@home.nl
In reply to this post by Felix Miata
On Saturday 23 December 2006 22:00, Felix Miata wrote:
> > How about an indirect conversion, like to .html? It would not be
> > that difficult to strip the tags.
>
> Looks plenty difficult to me. The file size increased by about
> 40%. Both SeaMonkey and Konq fail to display anything legible when
> opening the result from disk. If I try to open the result in OO
> after closing it, It paints the first page, then nothing else
> other than pegging the CPU.

Auch! µ-zoft with µ-compatibility? :-(

Last attempt: can you copy & paste? Does at least _that_ work? :-}

Cheers,

Leen
--
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: doc to txt conversion

Carlos E. R.-2
In reply to this post by Felix Miata
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


The Saturday 2006-12-23 at 15:00 -0500, Felix Miata wrote:

> Which app do we have to strip the glop from a M$ .doc file and output
> just the content to a plain text file? OO print to file doesn't seem to
> understand anything but postscript. Do I need to "install" a "text printer"?

Summary     : library to import Microsoft Word documents
Description : The wv2 library is used to import Microsoft Word documents in koffice for example.


Summary     : Word 8 Converter for Unix
Description : WV is a program that can understand the Microsoft Word 8 binary file
              format (Office97). It currently converts Word into HTML, which can then
              be read with a web browser.

(and there are html to text converters, I think).

- --
Cheers,
       Carlos E. R.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Made with pgp4pine 1.76

iD8DBQFFjZ0ctTMYHG2NR9URAnGSAJ9rk8hkjeQhDdGDxB6N5lSfKjBD7QCglDsB
NwILuxbeeJbmSTr1KQl2vXw=
=Rp7o
-----END PGP SIGNATURE-----

--
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: doc to txt conversion

Francesco Scaglioni
In reply to this post by Mike Noble
Antiword seems to work well.

--
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: doc to txt conversion

James Knott
In reply to this post by Felix Miata
Felix Miata wrote:
> Which app do we have to strip the glop from a M$ .doc file and output
> just the content to a plain text file? OO print to file doesn't seem to
> understand anything but postscript. Do I need to "install" a "text printer"?
>
> TIA
>  
Why not just save it as a text (.txt) file???

--
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: doc to txt conversion

Felix Miata
In reply to this post by Carlos E. R.-2
On 2006/12/23 22:18 (GMT+0100) Carlos E. R. apparently typed:

> The Saturday 2006-12-23 at 15:00 -0500, Felix Miata wrote:

>> Which app do we have to strip the glop from a M$ .doc file and
>> output just the content to a plain text file? OO print to file
>> doesn't seem to understand anything but postscript. Do I need to
>> "install" a "text printer"?

> Summary     : library to import Microsoft Word documents Description
> : The wv2 library is used to import Microsoft Word documents in
> koffice for example.

> Summary     : Word 8 Converter for Unix Description : WV is a program
> that can understand the Microsoft Word 8 binary file format
> (Office97). It currently converts Word into HTML, which can then be
> read with a web browser.

> (and there are html to text converters, I think).

I installed wv and wv2 with YaST, but they haven't shown up in the
menus, and wv from konsole gives command not found, even though rpm
claims they're installed. I can't find wv in /bin, /sbin, /usr/bin or
/usr/sbin. :-(
--
"Let your conversation be always full of grace." Colossians 4:6 NIV

 Team OS/2 ** Reg. Linux User #211409

Felix Miata  ***  http://mrmazda.no-ip.com/
--
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: doc to txt conversion

Bugzilla from leen.meyer@home.nl
On Sunday 24 December 2006 00:03, Felix Miata wrote:

> I installed wv and wv2 with YaST, but they haven't shown up in the
> menus, and wv from konsole gives command not found, even though
> rpm claims they're installed. I can't find wv in /bin, /sbin,
> /usr/bin or /usr/sbin. :-(

I browsed the wv rpm with mc, there are a bunch of wv* in /usr/bin,
and an explanation in /usr/share/doc/packages/wv/README.

In short: you're looking for /usr/bin/wvWare. ;-)

As for the wv2 rpm, there is a .so file. I guess it is used by wv,
but I'm not sure. Maybe http://wvware.sf.net/ has a clue... Going
there now.

Cheers,

Leen
--
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: doc to txt conversion

Bugzilla from leen.meyer@home.nl
On Sunday 24 December 2006 00:13, Leendert Meyer wrote:

> On Sunday 24 December 2006 00:03, Felix Miata wrote:
> > I installed wv and wv2 with YaST, but they haven't shown up in
> > the menus, and wv from konsole gives command not found, even
> > though rpm claims they're installed. I can't find wv in /bin,
> > /sbin, /usr/bin or /usr/sbin. :-(
>
> I browsed the wv rpm with mc, there are a bunch of wv* in
> /usr/bin, and an explanation in /usr/share/doc/packages/wv/README.
>
> In short: you're looking for /usr/bin/wvWare. ;-)
>
> As for the wv2 rpm, there is a .so file. I guess it is used by wv,
> but I'm not sure. Maybe http://wvware.sf.net/ has a clue... Going
> there now.

Arg. "rpm -qi -p wv2-*.rpm" says it: a library to import .docs in
KOffice.

Cheers,

Leen
--
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: doc to txt conversion

Michael S. Dunsavage
In reply to this post by Felix Miata
On Sat, 2006-12-23 at 18:03 -0500, Felix Miata wrote:

> On 2006/12/23 22:18 (GMT+0100) Carlos E. R. apparently typed:
>
> > The Saturday 2006-12-23 at 15:00 -0500, Felix Miata wrote:
>
> >> Which app do we have to strip the glop from a M$ .doc file and
> >> output just the content to a plain text file? OO print to file
> >> doesn't seem to understand anything but postscript. Do I need to
> >> "install" a "text printer"?
>
> > Summary     : library to import Microsoft Word documents Description
> > : The wv2 library is used to import Microsoft Word documents in
> > koffice for example.
>
> > Summary     : Word 8 Converter for Unix Description : WV is a program
> > that can understand the Microsoft Word 8 binary file format
> > (Office97). It currently converts Word into HTML, which can then be
> > read with a web browser.
>
> > (and there are html to text converters, I think).
>
> I installed wv and wv2 with YaST, but they haven't shown up in the
> menus, and wv from konsole gives command not found, even though rpm
> claims they're installed. I can't find wv in /bin, /sbin, /usr/bin or
> /usr/sbin. :-(
whereis wv

--
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: doc to txt conversion

Jim Cunning
In reply to this post by James Knott
On Saturday 23 December 2006 13:21, James Knott wrote:
> Felix Miata wrote:
> > Which app do we have to strip the glop from a M$ .doc file and output
> > just the content to a plain text file? OO print to file doesn't seem to
> > understand anything but postscript. Do I need to "install" a "text
> > printer"?
> >
> > TIA
>
> Why not just save it as a text (.txt) file???

I used to use a Linux utility called 'antiword' to read MSWord .doc files and
produce plain text that could be indexed with a utility called glimpse.  As I
recall it did a pretty good job of preserving the formatting--as much as is
possible with ascii text--and it certainly removed the 'glop'.

Jim Cunning
--
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: doc to txt conversion

Felix Miata
In reply to this post by Michael S. Dunsavage
On 2006/12/23 18:24 (GMT-0500) Michael S. Dunsavage apparently typed:

> On Sat, 2006-12-23 at 18:03 -0500, Felix Miata wrote:

>> I installed wv and wv2 with YaST, but they haven't shown up in the
>> menus, and wv from konsole gives command not found, even though rpm
>> claims they're installed. I can't find wv in /bin, /sbin, /usr/bin or
>> /usr/sbin. :-(

> whereis wv

Nothing useful there, just /usr/share/wv with a bunch of xml files.
--
"Let your conversation be always full of grace." Colossians 4:6 NIV

 Team OS/2 ** Reg. Linux User #211409

Felix Miata  ***  http://mrmazda.no-ip.com/
--
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: doc to txt conversion

Scott Jones-6
On Saturday 23 December 2006 18:09, Felix Miata wrote:

> On 2006/12/23 18:24 (GMT-0500) Michael S. Dunsavage apparently typed:
> > On Sat, 2006-12-23 at 18:03 -0500, Felix Miata wrote:
> >> I installed wv and wv2 with YaST, but they haven't shown up in the
> >> menus, and wv from konsole gives command not found, even though rpm
> >> claims they're installed. I can't find wv in /bin, /sbin, /usr/bin or
> >> /usr/sbin. :-(
> >
> > whereis wv
>
> Nothing useful there, just /usr/share/wv with a bunch of xml files.

I dunno.  I suppose it's a bit unintuitive to try rpm -ql to list the contents
of a package, but it gives me the following:

scott@inigo:~>rpm -ql wv
/usr/bin/wvAbw
/usr/bin/wvCleanLatex
/usr/bin/wvConvert
/usr/bin/wvDVI
/usr/bin/wvDocBook
/usr/bin/wvHtml
/usr/bin/wvLatex
/usr/bin/wvMime
/usr/bin/wvPDF
/usr/bin/wvPS
/usr/bin/wvRTF
/usr/bin/wvSummary
/usr/bin/wvText
/usr/bin/wvVersion
/usr/bin/wvWare
/usr/bin/wvWml
[...]
/usr/share/doc/packages/wv/README
[...]
/usr/share/man/man1/wvAbw.1.gz
/usr/share/man/man1/wvCleanLatex.1.gz
/usr/share/man/man1/wvDVI.1.gz
/usr/share/man/man1/wvHtml.1.gz
/usr/share/man/man1/wvLatex.1.gz
/usr/share/man/man1/wvMime.1.gz
/usr/share/man/man1/wvPDF.1.gz
/usr/share/man/man1/wvPS.1.gz
/usr/share/man/man1/wvRTF.1.gz
/usr/share/man/man1/wvSummary.1.gz
/usr/share/man/man1/wvText.1.gz
/usr/share/man/man1/wvVersion.1.gz
/usr/share/man/man1/wvWare.1.gz
/usr/share/man/man1/wvWml.1.gz
[...]

--
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: doc to txt conversion

Randall Schulz
In reply to this post by Felix Miata
On Saturday 23 December 2006 16:09, Felix Miata wrote:

> On 2006/12/23 18:24 (GMT-0500) Michael S. Dunsavage apparently typed:
> > On Sat, 2006-12-23 at 18:03 -0500, Felix Miata wrote:
> >> I installed wv and wv2 with YaST, but they haven't shown up in the
> >> menus, and wv from konsole gives command not found, even though
> >> rpm claims they're installed. I can't find wv in /bin, /sbin,
> >> /usr/bin or /usr/sbin. :-(
> >
> > whereis wv
>
> Nothing useful there, just /usr/share/wv with a bunch of xml files.

Then it's installed. Keep "apropos" or "man -k" in your repertoire:

% apropos wv
wvdialconf (1)       - build a configuration file for wvdial (1)
wvWare (1)           - convert msword documents
wvAbw (1)            - convert msword documents to Abiword's format
wvDVI (1)            - convert msword documents to DVI
wvHtml (1)           - convert msword documents to HTML4.0
wvLatex (1)          - convert msword documents to LaTeX
wvCleanLatex (1)     - convert msword documents to LaTeX
wvPDF (1)            - convert msword documents to PDF
wvPS (1)             - convert msword documents to PS
wvRTF (1)            - convert msword documents to RTF
wvText (1)           - convert msword documents to text
wvWml (1)            - convert msword documents to WML
nwvolinfo (1)        - Diplay info on NetWare Volumes
wvdial (1)           - PPP dialer with built-in intelligence.
wvMime (1)           - view MSWord documents
wvSummary (1)        - view word document's summary info
wvVersion (1)        - view word document's version #
wvline (3ncurses)    - create curses borders, horizontal and vertical lines
mvwvline (3ncurses)  - create curses borders, horizontal and vertical lines
mvwvline_set (3ncurses) - create curses borders or lines using complex characters and renditions
wvline_set (3ncurses) - create curses borders or lines using complex characters and renditions
wvdial.conf (5)      - wvdial configuration file


Obviously some of these are irrelevant, but I'm too lazy to edit them
out right now.

Judging from the synopses, "wvWare" is the main command to use:

% man wvWare
...
DESCRIPTION
wvWare converts word documents into other formats such as PS,PDF,HTML,LaTeX,DVI,ABW


Randall Schulz
--
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: doc to txt conversion

Felix Miata
In reply to this post by Bugzilla from leen.meyer@home.nl
On 2006/12/24 00:13 (GMT+0100) Leendert Meyer apparently typed:

> On Sunday 24 December 2006 00:03, Felix Miata wrote:

>> I installed wv and wv2 with YaST, but they haven't shown up in the
>> menus, and wv from konsole gives command not found, even though
>> rpm claims they're installed. I can't find wv in /bin, /sbin,
>> /usr/bin or /usr/sbin. :-(

> I browsed the wv rpm with mc, there are a bunch of wv* in /usr/bin,
> and an explanation in /usr/share/doc/packages/wv/README.

> In short: you're looking for /usr/bin/wvWare. ;-)

Seems to be useless. Word8/97 is a virtually 10 year old file format.
Whether I use wvHtml or wvText all I get for output is a 0 byte file,
with no error messages from wvHtml, and the message "Could not convert
to HTML" from wvText. :-(

However, that README points to http://wvware.sourceforge.net/ which in
turn recommends using abiword instead. Abiword shows up in the menu, and
creates HTML that SeaMonkey can open, and usable plain text. :-)

Abiword has a much longer list of file formats it can import and export
than OO. So, why is OO installed by default instead of Abiword?
--
"Let your conversation be always full of grace." Colossians 4:6 NIV

 Team OS/2 ** Reg. Linux User #211409

Felix Miata  ***  http://mrmazda.no-ip.com/
--
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

12