Dictionaries - usage, upstream, updates

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Dictionaries - usage, upstream, updates

Vojtěch Zeisek-2
Hi,
there is recent work on new Czech dictionary, which should have better
vocabulary and should be better in handling complex Czech grammar. The work is
still far from being complete, but so far it's promising. Some info (mostly in
Czech, sorry) is at https://gitlab.com/strepon/czech-cc0-dictionaries/ and
http://ceskeslovniky.cz/about.html and https://github.com/l10ncz/
We discussed this at https://www.linuxdays.cz/2019/en/ last weekend (BTW,
(open)SUSE had the best presentation there) and there were couple of questions
we were unable to answer, so I seek advice here. :-)
The dictionary above is now available as hunspell addon for LO. This format is
used also by FF, TB and more. But what about KDE, GNOME, other DEs, Qt (non-
KDE) apps? Others? What do they use? In another words, what is purpose of
having distribution packages for aspell, ispell, hunspell, myspell? What is
relationship among them? What is used for what?
Regarding hunspell/myspell, what is upstream for the dictionaries? We were
unable to find where does the Czech dictionary came from. :-)
One day we'd like to merge the new dictionary with the existing one, but it
requires much more work now. Anyway, it'd be good to have distribution package
available for testing. We then wonder if it's possible to have installed
together e.g. myspell-cs_CZ and myspell-cs_CZ_EXPERIMENTAL and somehow switch
between them. Or more general, how is packaging of dictionaries organized?
Technically as well as getting the data.
I'm looking forward for any point on this topic.
V.

--
Vojtěch Zeisek

Komunita openSUSE GNU/Linuxu
Community of the openSUSE GNU/Linux

https://www.opensuse.org/
https://trapa.cz/

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Dictionaries - usage, upstream, updates

Tomas Chvatal-2
Vojtěch Zeisek píše v Út 08. 10. 2019 v 09:31 +0200:

> Hi,
> there is recent work on new Czech dictionary, which should have
> better
> vocabulary and should be better in handling complex Czech grammar.
> The work is
> still far from being complete, but so far it's promising. Some info
> (mostly in
> Czech, sorry) is at
> https://gitlab.com/strepon/czech-cc0-dictionaries/ and
> http://ceskeslovniky.cz/about.html and https://github.com/l10ncz/
> We discussed this at https://www.linuxdays.cz/2019/en/ last weekend
> (BTW,
> (open)SUSE had the best presentation there) and there were couple of
> questions
> we were unable to answer, so I seek advice here. :-)
> The dictionary above is now available as hunspell addon for LO. This
> format is
> used also by FF, TB and more. But what about KDE, GNOME, other DEs,
> Qt (non-
> KDE) apps? Others? What do they use? In another words, what is
> purpose of
> having distribution packages for aspell, ispell, hunspell, myspell?
> What is
> relationship among them? What is used for what?
> Regarding hunspell/myspell, what is upstream for the dictionaries? We
> were
> unable to find where does the Czech dictionary came from. :-)
> One day we'd like to merge the new dictionary with the existing one,
> but it
> requires much more work now. Anyway, it'd be good to have
> distribution package
> available for testing. We then wonder if it's possible to have
> installed
> together e.g. myspell-cs_CZ and myspell-cs_CZ_EXPERIMENTAL and
> somehow switch
> between them. Or more general, how is packaging of dictionaries
> organized?
> Technically as well as getting the data.
> I'm looking forward for any point on this topic.
> V.
Hello Vojtech,

I so wanted to join this chat, but sadly as and Org didn't get the time
to do much else except handling the conference itself.

The first thing I have is that we set up the dictionaries to be auto-
generated from the libreoffice-dictionaries repository for all the
languages.
The previous approach was individual projects and packages and it was
pain in the butt to keep it up-to-date and working well.

So I would first recommend to take that project and integrate them in
LO [1] and we will automatically inherit them.

For the interim testing you can name the package as i.e. 'hunspell-new-
cs_CZ' and have it provide the symbols of the other czech dictionary
name and at the same time conflict with it. It will let users choose
the dictionary.

But I can't stress enough how much pain it was to keep the separate
pkgs for various languages so really try to integrate it in the
libreoffice.

Cheers

Tom

[1] https://cgit.freedesktop.org/libreoffice/dictionaries/

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Dictionaries - usage, upstream, updates

Vojtěch Zeisek-2
Hi,
sorry for long silence, my mail system just quietly dropped this mail as spam
(I had to download it from the archive)... :-/ I'll try to resurrect some
questions below...

Dne úterý 8. října 2019 9:51:29 CET, Tomas Chvatal napsal(a):

> Vojtěch Zeisek píše v Út 08. 10. 2019 v 09:31 +0200:
> > Hi,
> > there is recent work on new Czech dictionary, which should
> > have better vocabulary and should be better in handling complex
> > Czech grammar. The work is still far from being complete, but
> > so far it's promising. Some info (mostly in Czech, sorry) is at
> > https://gitlab.com/strepon/czech-cc0-dictionaries/ and
> > http://ceskeslovniky.cz/about.html and https://github.com/l10ncz/
> > We discussed this at https://www.linuxdays.cz/2019/en/ last
> > weekend (BTW, (open)SUSE had the best presentation there) and
> > there were couple of questions we were unable to answer, so I
> > seek advice here. :-)
> > The dictionary above is now available as hunspell addon for LO.
> > This format is used also by FF, TB and more. But what about KDE,
> > GNOME, other DEs, Qt (non-KDE) apps? Others? What do they use?
I'm sorry, I could be looking wrongly, I didn't find which dictionaries are
used by all the desktop environments and another big projects... Any idea?

> > In another words, what is purpose of having distribution packages
> > for aspell, ispell, hunspell, myspell? What is relationship among
> > them? What is used for what?

I still wonder what is used for what...

> > Regarding hunspell/myspell, what is upstream for the dictionaries?

So the upstream is LO. Is it so for all projects/distributions?

> > We were unable to find where does the Czech dictionary came from. :-)
> > One day we'd like to merge the new dictionary with the existing
> > one, but it requires much more work now. Anyway, it'd be good to
> > have distribution package available for testing. We then wonder
> > if it's possible to have installed together e.g. myspell-cs_CZ
> > and myspell-cs_CZ_EXPERIMENTAL and somehow switch between them.
> > Or more general, how is packaging of dictionaries organized?
> > Technically as well as getting the data.
> > I'm looking forward for any point on this topic.
> > V.
>
> Hello Vojtech,
> I so wanted to join this chat, but sadly as and Org didn't get
> the time to do much else except handling the conference itself.
> The first thing I have is that we set up the dictionaries to be
> auto-generated from the libreoffice-dictionaries repository for
> all the languages.
Do You mean all hunspell dictionaries, or also myspell and I don't know what
else? So for openSUSE, is LO the upstream? Do You know if it's general
upstream for everyone using hunspell?

> The previous approach was individual projects and packages and
> it was pain in the butt to keep it up-to-date and working well.

I can imagine, I wonder how this is organized, also in other distributions.

> So I would first recommend to take that project and integrate
> them in LO [1] and we will automatically inherit them.

According to Stanislav Horáček maintaining the new project, it's not ready
yet...

> For the interim testing you can name the package as i.e. 'hunspell
> -new-cs_CZ' and have it provide the symbols of the other czech
> dictionary name and at the same time conflict with it. It will
> let users choose the dictionary.

Do You mean usage of update-alternatives according to <https://
en.opensuse.org/openSUSE:Packaging_Multiple_Version_guidelines> ?

> But I can't stress enough how much pain it was to keep the
> separate pkgs for various languages so really try to integrate
> it in the libreoffice.

It'd planned, but it seems there is still plenty of work to be done, so it
won't be soon...

> Cheers
> Tom
> [1] https://cgit.freedesktop.org/libreoffice/dictionaries/

Kind regards,
V.

--
Vojtěch Zeisek

Komunita openSUSE GNU/Linuxu
Community of the openSUSE GNU/Linux

https://www.opensuse.org/
https://trapa.cz/

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Dictionaries - usage, upstream, updates

Tomas Chvatal-2
Vojtěch Zeisek píše v Út 29. 10. 2019 v 13:28 +0100:

> Hi,
> sorry for long silence, my mail system just quietly dropped this mail
> as spam
> (I had to download it from the archive)... :-/ I'll try to resurrect
> some
> questions below...
>
> Dne úterý 8. října 2019 9:51:29 CET, Tomas Chvatal napsal(a):
> > Vojtěch Zeisek píše v Út 08. 10. 2019 v 09:31 +0200:
> > > Hi,
> > > there is recent work on new Czech dictionary, which should
> > > have better vocabulary and should be better in handling complex
> > > Czech grammar. The work is still far from being complete, but
> > > so far it's promising. Some info (mostly in Czech, sorry) is at
> > > https://gitlab.com/strepon/czech-cc0-dictionaries/ and
> > > http://ceskeslovniky.cz/about.html and https://github.com/l10ncz/
> > > We discussed this at https://www.linuxdays.cz/2019/en/ last
> > > weekend (BTW, (open)SUSE had the best presentation there) and
> > > there were couple of questions we were unable to answer, so I
> > > seek advice here. :-)
> > > The dictionary above is now available as hunspell addon for LO.
> > > This format is used also by FF, TB and more. But what about KDE,
> > > GNOME, other DEs, Qt (non-KDE) apps? Others? What do they use?
>
> I'm sorry, I could be looking wrongly, I didn't find which
> dictionaries are
> used by all the desktop environments and another big projects... Any
> idea?
Everything uses hunspell. Some minor cli tools are based on
aspell/ispell but everything in your gui is by default using hunspell
regardless it being KDE/Gnome/whatever.

For ease of mind you can also think of hunspell == myspell.

>
> > > In another words, what is purpose of having distribution packages
> > > for aspell, ispell, hunspell, myspell? What is relationship among
> > > them? What is used for what?
>
> I still wonder what is used for what...

Ispell: oldest spelling, probably used by few old timers.
Aspell: gnu spelling toolset used by few packages.
Myspell: older variant of hunspell sorta
Hunspell: latest spellchecker used by LO and all desktops

>
> > > Regarding hunspell/myspell, what is upstream for the
> > > dictionaries?
>
> So the upstream is LO. Is it so for all projects/distributions?

Any distribution can do it on their own. We decided to go this route to
keep it easy and not have milion of conflicting packages that people
can use.

>
> > > We were unable to find where does the Czech dictionary came from.
> > > :-)
> > > One day we'd like to merge the new dictionary with the existing
> > > one, but it requires much more work now. Anyway, it'd be good to
> > > have distribution package available for testing. We then wonder
> > > if it's possible to have installed together e.g. myspell-cs_CZ
> > > and myspell-cs_CZ_EXPERIMENTAL and somehow switch between them.
> > > Or more general, how is packaging of dictionaries organized?
> > > Technically as well as getting the data.
> > > I'm looking forward for any point on this topic.
> > > V.
> >
> > Hello Vojtech,
> > I so wanted to join this chat, but sadly as and Org didn't get
> > the time to do much else except handling the conference itself.
> > The first thing I have is that we set up the dictionaries to be
> > auto-generated from the libreoffice-dictionaries repository for
> > all the languages.
>
> Do You mean all hunspell dictionaries, or also myspell and I don't
> know what
> else? So for openSUSE, is LO the upstream? Do You know if it's
> general
> upstream for everyone using hunspell?
It is not a general upstream. But other distros base of this too and it
is much better than any other location to put it into.
>
> > The previous approach was individual projects and packages and
> > it was pain in the butt to keep it up-to-date and working well.
>
> I can imagine, I wonder how this is organized, also in other
> distributions.
>
Some have individual packages maintianed by volunteers for the specific
languages, some do like us.

> > So I would first recommend to take that project and integrate
> > them in LO [1] and we will automatically inherit them.
>
> According to Stanislav Horáček maintaining the new project, it's not
> ready
> yet...
>
> > For the interim testing you can name the package as i.e. 'hunspell
> > -new-cs_CZ' and have it provide the symbols of the other czech
> > dictionary name and at the same time conflict with it. It will
> > let users choose the dictionary.
>
> Do You mean usage of update-alternatives according to <https://
> en.opensuse.org/openSUSE:Packaging_Multiple_Version_guidelines> ?
Nope, just creating package with new name and adding Provides: and
Conflicts: lines in the spec file to make sure it is either the new one
or the old one.

>
> > But I can't stress enough how much pain it was to keep the
> > separate pkgs for various languages so really try to integrate
> > it in the libreoffice.
>
> It'd planned, but it seems there is still plenty of work to be done,
> so it
> won't be soon...
>

Well when it's ready it will be nice I bet :)

Cheers

Tom

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Dictionaries - usage, upstream, updates

Vojtěch Zeisek-2
Dne středa 30. října 2019 8:50:55 CET, Tomas Chvatal napsal(a):

> Vojtěch Zeisek píše v Út 29. 10. 2019 v 13:28 +0100:
> > Hi,
> > sorry for long silence, my mail system just quietly dropped this
> > mail as spam (I had to download it from the archive)... :-/ I'll
> > try to resurrect some questions below...
> >
> > Dne úterý 8. října 2019 9:51:29 CET, Tomas Chvatal napsal(a):
> > > Vojtěch Zeisek píše v Út 08. 10. 2019 v 09:31 +0200:
> > > > Hi,
> > > > there is recent work on new Czech dictionary, which should
> > > > have better vocabulary and should be better in handling complex
> > > > Czech grammar. The work is still far from being complete, but
> > > > so far it's promising. Some info (mostly in Czech, sorry) is at
> > > > https://gitlab.com/strepon/czech-cc0-dictionaries/ and
> > > > http://ceskeslovniky.cz/about.html and https://github.com/l10ncz/
> > > > We discussed this at https://www.linuxdays.cz/2019/en/ last
> > > > weekend (BTW, (open)SUSE had the best presentation there) and
> > > > there were couple of questions we were unable to answer, so I
> > > > seek advice here. :-)
> > > > The dictionary above is now available as hunspell addon for LO.
> > > > This format is used also by FF, TB and more. But what about KDE,
> > > > GNOME, other DEs, Qt (non-KDE) apps? Others? What do they use?
> >
> > I'm sorry, I could be looking wrongly, I didn't find which
> > dictionaries are used by all the desktop environments and
> > another big projects... Any idea?
>
> Everything uses hunspell. Some minor cli tools are based on
> aspell/ispell but everything in your gui is by default using
> hunspell regardless it being KDE/Gnome/whatever.
> For ease of mind you can also think of hunspell == myspell.
>
> > > > In another words, what is purpose of having distribution
> > > > packages for aspell, ispell, hunspell, myspell? What is
> > > > relationship among them? What is used for what?
> >
> > I still wonder what is used for what...
>
> Ispell: oldest spelling, probably used by few old timers.
> Aspell: gnu spelling toolset used by few packages.
> Myspell: older variant of hunspell sorta
> Hunspell: latest spellchecker used by LO and all desktops
>
> > > > Regarding hunspell/myspell, what is upstream for the
> > > > dictionaries?
> >
> > So the upstream is LO. Is it so for all projects/distributions?
>
> Any distribution can do it on their own. We decided to go this
> route to keep it easy and not have milion of conflicting packages
> that people can use.
>
> > > > We were unable to find where does the Czech dictionary came
> > > > from. :-) One day we'd like to merge the new dictionary with
> > > > the existing one, but it requires much more work now. Anyway,
> > > > it'd be good to have distribution package available for testing.
> > > > We then wonder if it's possible to have installed together e.g.
> > > > myspell-cs_CZ and myspell-cs_CZ_EXPERIMENTAL and somehow switch
> > > > between them. Or more general, how is packaging of dictionaries
> > > > organized? Technically as well as getting the data.
> > > > I'm looking forward for any point on this topic.
> > > > V.
> > >
> > > Hello Vojtech,
> > > I so wanted to join this chat, but sadly as and Org didn't get
> > > the time to do much else except handling the conference itself.
> > > The first thing I have is that we set up the dictionaries to be
> > > auto-generated from the libreoffice-dictionaries repository for
> > > all the languages.
> >
> > Do You mean all hunspell dictionaries, or also myspell and I don't
> > know what else? So for openSUSE, is LO the upstream? Do You know
> > if it's general upstream for everyone using hunspell?
>
> It is not a general upstream. But other distros base of this too
> and it is much better than any other location to put it into.
>
> > > The previous approach was individual projects and packages and
> > > it was pain in the butt to keep it up-to-date and working well.
> >
> > I can imagine, I wonder how this is organized, also in other
> > distributions.
>
> Some have individual packages maintianed by volunteers for the
> specific languages, some do like us.
>
> > > So I would first recommend to take that project and integrate
> > > them in LO [1] and we will automatically inherit them.
> >
> > According to Stanislav Horáček maintaining the new project,
> > it's not ready yet...
> >
> > > For the interim testing you can name the package as i.e. 'hunspell
> > > -new-cs_CZ' and have it provide the symbols of the other czech
> > > dictionary name and at the same time conflict with it. It will
> > > let users choose the dictionary.
> >
> > Do You mean usage of update-alternatives according to <https://
> > en.opensuse.org/openSUSE:Packaging_Multiple_Version_guidelines> ?
>
> Nope, just creating package with new name and adding Provides:
> and Conflicts: lines in the spec file to make sure it is either
> the new one or the old one.
>
> > > But I can't stress enough how much pain it was to keep the
> > > separate pkgs for various languages so really try to integrate
> > > it in the libreoffice.
> >
> > It'd planned, but it seems there is still plenty of work to be done,
> > so it won't be soon...
>
> Well when it's ready it will be nice I bet :)
> Cheers
> Tom
Perfect :-) Thank You for nice summary, it's pretty clear and straightforward
now.

--
Vojtěch Zeisek

Komunita openSUSE GNU/Linuxu
Community of the openSUSE GNU/Linux

https://www.opensuse.org/
https://trapa.cz/

signature.asc (849 bytes) Download Attachment