difference between two disks

classic Classic list List threaded Threaded
56 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: difference between two disks

jdd@dodin.org
Le 23/01/2018 à 08:18, Bernhard Voelker a écrit :
> On 01/22/2018 09:00 PM, [hidden email] wrote:
>> I couldn't have two identical copies, I give up:-(
>
> well, you could surely achieve this by creating a copy of
> the (unmounted!) file system instead of the content therein.
> E.g. with dd [1], you need the arguments:


sure, but what for? I have no clue if the original copy do not have some
problems...

I sometime have weird file names, most probably some utf8 badly
converted from other file system that are visible on dolphin by black
square (nul?) in the file name. Dolphin refuses do manage them, only *
or ? can do, so I can remove them or rename them with simpler values, if
I can catch them, but some reappear from time to time, I don't know
really why

this may - or not - be the problem.

anyway I just made an other rsync (same archive, but the target is an
other disk) and there is also some difference

ghost in the shell?? :-)

jdd


--
http://dodin.org

--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: difference between two disks

Bernhard Voelker
On 01/23/2018 11:44 AM, [hidden email] wrote:

> Le 23/01/2018 à 08:18, Bernhard Voelker a écrit :
>> On 01/22/2018 09:00 PM, [hidden email] wrote:
>>> I couldn't have two identical copies, I give up:-(
>>
>> well, you could surely achieve this by creating a copy of
>> the (unmounted!) file system instead of the content therein.
>> E.g. with dd [1], you need the arguments:
>
>
> sure, but what for? I have no clue if the original copy do not have some problems...
>
> I sometime have weird file names, most probably some utf8 badly converted from other file system that are visible on dolphin by black square (nul?) in the file name. Dolphin refuses do manage them,
> only * or ? can do, so I can remove them or rename them with simpler values, if I can catch them, but some reappear from time to time, I don't know really why
>
> this may - or not - be the problem.
>
> anyway I just made an other rsync (same archive, but the target is an other disk) and there is also some difference
>
> ghost in the shell?? :-)

well, a dd clone doesn't care about what files and file names are inside;
it doesn't even know that the data copied is a file system. ;-)

Regarding bad file names: even if an application shows it with * or ?
or whatever, it should be able to handle the file.
You can also rename them to proper names.
However, I don't think this affects whether a rsync'ed copy has the
same disk usage.

As mentioned earlier: use "du --apparent-size ..." to compare the sizes,
or use "find -xdev -type f -printf '%s %p\n'" to get a such a list for
comparison.  It's not the brutto bytes the file system uses for storing a file
which is important, but the net content' size.

Have a nice day,
Berny


--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: difference between two disks

jdd@dodin.org
Le 23/01/2018 à 13:28, Bernhard Voelker a écrit :

> comparison.  It's not the brutto bytes the file system uses for storing a file
> which is important, but the net content' size.
>
sure, but never the less I have yet to understand why copies made with
rsync are not identical :-(

but may be for an other time

thanks
jdd


--
http://dodin.org

--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: difference between two disks

Bernhard Voelker
On 01/23/2018 01:55 PM, [hidden email] wrote:
> sure, but never the less I have yet to understand why copies made with rsync are not identical :-(

they are - from the point of view of the content of the files, and with e.g. -HAXax option
also from user/group/permission/hardlink/attributes point of view.  Still, the file
system may e.g. squeeze 10M of NULs inside a file and therefore store it differently.
Therefore, du(1) without the --apparent-size option will show what the file system
reports about the size of a file:

  $ truncate -s1T file

  $ ls -log file
  -rw-r--r-- 1 1099511627776 Jan 23 14:12 file

  $ du -h file
  0 file

  $ du -h --apparent-size file
  1.0T file

It's just your expectation of du's output which does not match how things work.
Misunderstandings like that are e.g. explained here:

https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#df-and-du-report-different-information

Have a nice day,
Berny


--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: difference between two disks

Carlos E. R.-2
In reply to this post by jdd@dodin.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On Monday, 2018-01-22 at 21:00 +0100, [hidden email] wrote:

> I couldn't have two identical copies, I give up :-(
>
> just hope than nothing important is missed

Make a listing of both directories and compare them with diff. Was
suggested in some post. It will find the differences.

- --
Cheers,
        Carlos E. R.
        (from openSUSE 42.2 x86_64 "Malachite" at Telcontar)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlpnPfoACgkQtTMYHG2NR9WEDQCgidminRXlZ8QroX8SYoO6bcPf
8X0An0RXTsXI0bjmTyxuN09UGdDXxZhN
=iBWa
-----END PGP SIGNATURE-----

--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: difference between two disks

jdd@dodin.org
In reply to this post by Bernhard Voelker
Le 23/01/2018 à 14:18, Bernhard Voelker a écrit :
> On 01/23/2018 01:55 PM, [hidden email] wrote:
>> sure, but never the less I have yet to understand why copies made with rsync are not identical :-(
>
> they are - from the point of view of the content of the files, and with e.g. -HAXax option
> also from user/group/permission/hardlink/attributes point of view.  Still, the file
> system may e.g. squeeze 10M of NULs inside a file and therefore store it differently.

including with identical checksum!

> Therefore, du(1) without the --apparent-size option will show what the file system
> reports about the size of a file:

same problem

# du --apparent-size -s /run/media/jdd/intenso*
3239147738      /run/media/jdd/intenso4to
3239148282      /run/media/jdd/intenso5to2


> It's just your expectation of du's output which does not match how things work.
> Misunderstandings like that are e.g. explained here:
>
> https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#df-and-du-report-different-information

the difference between du and df is an other problem: a small file uses
sector size room (I speak of static data files, both archives, so no
unlinked file AFAIK)

curious

jdd
--
http://dodin.org

--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: difference between two disks

jdd@dodin.org
In reply to this post by Carlos E. R.-2
Le 23/01/2018 à 14:51, Carlos E. R. a écrit :

> Make a listing of both directories and compare them with diff. Was
> suggested in some post. It will find the differences.


I did, but got a long misting I didn't understand :-(

jdd


--
http://dodin.org

--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: difference between two disks

Per Jessen
[hidden email] wrote:

> Le 23/01/2018 à 14:51, Carlos E. R. a écrit :
>
>> Make a listing of both directories and compare them with diff. Was
>> suggested in some post. It will find the differences.
>
>
> I did, but got a long misting I didn't understand :-(

A long diff listing suggests lots of differences.  
You could post it, someone will no doubt know what it says.



--
Per Jessen, Zürich (7.8°C)
http://www.hostsuisse.com/ - virtual servers, made in Switzerland.


--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: difference between two disks

Carlos E. R.-2
In reply to this post by jdd@dodin.org
On 2018-01-23 15:07, [hidden email] wrote:
> Le 23/01/2018 à 14:51, Carlos E. R. a écrit :
>
>> Make a listing of both directories and compare them with diff. Was
>> suggested in some post. It will find the differences.
>
>
> I did, but got a long misting I didn't understand :-(

Well, then we can work at that, it should be a short list.

--
Cheers / Saludos,

                Carlos E. R.
                (from 42.2 x86_64 "Malachite" at Telcontar)


signature.asc (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: difference between two disks

Andrei Borzenkov
In reply to this post by jdd@dodin.org


Отправлено с iPhone

> 23 янв. 2018 г., в 15:55, "[hidden email]" <[hidden email]> написал(а):
>
>> Le 23/01/2018 à 13:28, Bernhard Voelker a écrit :
>>
>> comparison.  It's not the brutto bytes the file system uses for storing a file
>> which is important, but the net content' size.
> sure, but never the less I have yet to understand why copies made with rsync are not identical :-(
>

One obvious example - file had non-zero block (and so it consumed real storage) which was later overwritten by zeroes. It will continue to consume real block on source, but on destination it is replaced by hole (no storage consumption).
--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: difference between two disks

Carlos E. R.-2
In reply to this post by Carlos E. R.-2
On 2018-01-23 15:23, Carlos E. R. wrote:

> On 2018-01-23 15:07, [hidden email] wrote:
>> Le 23/01/2018 à 14:51, Carlos E. R. a écrit :
>>
>>> Make a listing of both directories and compare them with diff. Was
>>> suggested in some post. It will find the differences.
>>
>>
>> I did, but got a long misting I didn't understand :-(
>
> Well, then we can work at that, it should be a short list.

You may run

find . -ls

on the root of each tree, which will print information on each file, relative to the current directory:


193485    4 -rw-rw-r--   2 news     news         1450 Dec  7  2015 ./var_spool_news/alt/linux/suse/2030
193486    4 -rw-rw-r--   2 news     news         1782 Dec  7  2015 ./var_spool_news/alt/linux/suse/2031
193487    4 -rw-rw-r--   2 news     news         1520 Dec  7  2015 ./var_spool_news/alt/linux/suse/2032
193488    4 -rw-rw-r--   2 news     news         1939 Dec  7  2015 ./var_spool_news/alt/linux/suse/2033


File: find.info,  Node: Print File Information,  Next: Run Commands,  Prev: Print File Name,  Up: Actions

3.2 Print File Information
==========================

 -- Action: -ls
     True; list the current file in `ls -dils' format on the standard
     output.  The output looks like this:

          204744   17 -rw-r--r--   1 djm      staff       17337 Nov  2  1992 ./lwall-quotes

     The fields are:

       1. The inode number of the file.  *Note Hard Links::, for how to
          find files based on their inode number.

       2. the number of blocks in the file.  The block counts are of 1K
          blocks, unless the environment variable `POSIXLY_CORRECT' is
          set, in which case 512-byte blocks are used.  *Note Size::,
          for how to find files based on their size.

       3. The file's type and file mode bits.  The type is shown as a
          dash for a regular file; for other file types, a letter like
          for `-type' is used (*note Type::).  The file mode bits are
          read, write, and execute/search for the file's owner, its
          group, and other users, respectively; a dash means the
          permission is not granted.  *Note File Permissions::, for
          more details about file permissions.  *Note Mode Bits::, for
          how to find files based on their file mode bits.

       4. The number of hard links to the file.

       5. The user who owns the file.

       6. The file's group.

       7. The file's size in bytes.

       8. The date the file was last modified.

       9. The file's name.  `-ls' quotes non-printable characters in the
          file names using C-like backslash escapes.  This may change
          soon, as the treatment of unprintable characters is
          harmonised for `-ls', `-fls', `-print', `-fprint', `-printf'
          and `-fprintf'.



--
Cheers / Saludos,

                Carlos E. R.
                (from 42.2 x86_64 "Malachite" at Telcontar)


signature.asc (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: difference between two disks

jdd@dodin.org
In reply to this post by Andrei Borzenkov
Le 23/01/2018 à 15:26, Andrei Borzenkov a écrit :


> One obvious example - file had non-zero block (and so it consumed real storage) which was later overwritten by zeroes. It will continue to consume real block on source, but on destination it is replaced by hole (no storage consumption).
>
even with checksum control??

jdd

--
http://dodin.org

--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: difference between two disks

Andrei Borzenkov
23.01.2018 22:27, [hidden email] пишет:

> Le 23/01/2018 à 15:26, Andrei Borzenkov a écrit :
>
>
>> One obvious example - file had non-zero block (and so it consumed real
>> storage) which was later overwritten by zeroes. It will continue to
>> consume real block on source, but on destination it is replaced by
>> hole (no storage consumption).
>>
> even with checksum control??
>

checksums will see zeros in both cases.

--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: difference between two disks

Bernhard Voelker
In reply to this post by jdd@dodin.org
On 01/23/2018 02:54 PM, [hidden email] wrote:
> same problem
>
> # du --apparent-size -s /run/media/jdd/intenso*
> 3239147738      /run/media/jdd/intenso4to
> 3239148282      /run/media/jdd/intenso5to2

and there's e.g. directories:

  $ mkdir d1 d2

Create 10000 files in 'd1':

  $ seq -f "d1/%g" 10000 | xargs touch

... and remove them again:

  $ rm d1/*

now see the difference:

  $ du -s d1 d2
  156 d1
  4 d2

Have a nice day,
Berny

--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: difference between two disks

jdd@dodin.org
In reply to this post by Andrei Borzenkov
Le 23/01/2018 à 20:29, Andrei Borzenkov a écrit :

> 23.01.2018 22:27, [hidden email] пишет:
>> Le 23/01/2018 à 15:26, Andrei Borzenkov a écrit :
>>
>>
>>> One obvious example - file had non-zero block (and so it consumed real
>>> storage) which was later overwritten by zeroes. It will continue to
>>> consume real block on source, but on destination it is replaced by
>>> hole (no storage consumption).
>>>
>> even with checksum control??
>>
>
> checksums will see zeros in both cases.
>
really curious

thanks
jdd

--
http://dodin.org

--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: difference between two disks

jdd@dodin.org
In reply to this post by Bernhard Voelker
Le 23/01/2018 à 20:36, Bernhard Voelker a écrit :

> On 01/23/2018 02:54 PM, [hidden email] wrote:
>> same problem
>>
>> # du --apparent-size -s /run/media/jdd/intenso*
>> 3239147738      /run/media/jdd/intenso4to
>> 3239148282      /run/media/jdd/intenso5to2
>
> and there's e.g. directories:
>
>   $ mkdir d1 d2
>
> Create 10000 files in 'd1':
>
>   $ seq -f "d1/%g" 10000 | xargs touch
>
> ... and remove them again:
>
>   $ rm d1/*
>
> now see the difference:
>
>   $ du -s d1 d2
>   156    d1
>   4    d2

so even non existent and not openned files can take place... any way to
reclame the space?

and same with --apparent-size?

so is there a way to really compare two mirrored disks to see if the
copy is good (ext4)?

thanks
jdd

--
http://dodin.org

--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: difference between two disks

jdd@dodin.org
In reply to this post by Carlos E. R.-2
Le 23/01/2018 à 15:56, Carlos E. R. a écrit :

>> Well, then we can work at that, it should be a short list.

I try again. The two disks are not the same as before, but the files are
and also come from rsync --delete

linux-owxt:/run/media/jdd/intenso5to2 # ls -aR > ../5t02.txt
linux-owxt:/run/media/jdd/intenso5to2 # cd ../intenso4to/
linux-owxt:/run/media/jdd/intenso4to # ls -aR > ../4to.txt
linux-owxt:/run/media/jdd/intenso4to # cd ..
linux-owxt:/run/media/jdd # diff -b 5t02.txt 4to.txt  > diff.txt

diff.txt is zero bytes

linux-owxt:/run/media/jdd # du --apparent-size -s /run/media/jdd/intenso*
3239147738      /run/media/jdd/intenso4to
3239148282      /run/media/jdd/intenso5to2

> find . -ls
>
linux-owxt:/run/media/jdd/intenso5to2 # find . -ls > ../find5.txt
linux-owxt:/run/media/jdd/intenso5to2 # cd ../intenso4to/
linux-owxt:/run/media/jdd/intenso4to # find . -ls > ../find4.txt

the size of the to files is not identical, but the size of the file
(more than 300Mb)

# diff -b find5.txt find4.txt  > diff2.txt

then diff2.txt is very large, the sum of the two files, for sure the
diff syntax is not good

what may I do?

thanks
jdd

jdd

--
http://dodin.org

--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: difference between two disks

Bernhard Voelker
In reply to this post by jdd@dodin.org
On 01/23/2018 09:00 PM, [hidden email] wrote:
> so even non existent and not openned files can take place... any way to
> reclame the space?

AFAIK no.  The only workaround I know is to create a new
directory "d1.new", move all remaining entries (if any) from "d1"
to "d1.new", then rmdir "d1" and rename "d1.tmp" to "d1".

> and same with --apparent-size?

yes, this is /apparent/ size. ;-)

> so is there a way to really compare two mirrored disks to see if the
> copy is good (ext4)?

well, you can check with rsync again, telling it to compare the
content as well (file size and time stamps is not good enough!):

   rsync -HAXaxi --checksum --dry-run SRC/. DST/.

If nothing shows up there, then I'd say the backup looks good.

Have a nice day,
Berny

--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: difference between two disks

Carlos E. R.-2
In reply to this post by jdd@dodin.org
On 2018-01-23 20:56, [hidden email] wrote:

> Le 23/01/2018 à 20:29, Andrei Borzenkov a écrit :
>> 23.01.2018 22:27, [hidden email] пишет:
>>> Le 23/01/2018 à 15:26, Andrei Borzenkov a écrit :
>>>
>>>
>>>> One obvious example - file had non-zero block (and so it consumed real
>>>> storage) which was later overwritten by zeroes. It will continue to
>>>> consume real block on source, but on destination it is replaced by
>>>> hole (no storage consumption).
>>>>
>>> even with checksum control??
>>>
>>
>> checksums will see zeros in both cases.
>>
> really curious
Think of it as compression, the files are the same, but compressed.


--
Cheers / Saludos,

                Carlos E. R.
                (from 42.2 x86_64 "Malachite" at Telcontar)


signature.asc (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: difference between two disks

Carlos E. R.-2
In reply to this post by jdd@dodin.org
On 2018-01-23 21:38, [hidden email] wrote:

> Le 23/01/2018 à 15:56, Carlos E. R. a écrit :
>
>>> Well, then we can work at that, it should be a short list.
>
> I try again. The two disks are not the same as before, but the files are
> and also come from rsync --delete
>
> linux-owxt:/run/media/jdd/intenso5to2 # ls -aR > ../5t02.txt
> linux-owxt:/run/media/jdd/intenso5to2 # cd ../intenso4to/
> linux-owxt:/run/media/jdd/intenso4to # ls -aR > ../4to.txt
> linux-owxt:/run/media/jdd/intenso4to # cd ..
> linux-owxt:/run/media/jdd # diff -b 5t02.txt 4to.txt  > diff.txt
>
> diff.txt is zero bytes
Well, "ls -aR" doesn't list file sizes. The list of files are the same,
no missing files.

>
> linux-owxt:/run/media/jdd # du --apparent-size -s /run/media/jdd/intenso*
> 3239147738      /run/media/jdd/intenso4to
> 3239148282      /run/media/jdd/intenso5to2

The difference is 544 bytes "blocks".

>
>> find . -ls
>>
> linux-owxt:/run/media/jdd/intenso5to2 # find . -ls > ../find5.txt
> linux-owxt:/run/media/jdd/intenso5to2 # cd ../intenso4to/
> linux-owxt:/run/media/jdd/intenso4to # find . -ls > ../find4.txt
>
> the size of the to files is not identical, but the size of the file
> (more than 300Mb)
>
> # diff -b find5.txt find4.txt  > diff2.txt
>
> then diff2.txt is very large, the sum of the two files, for sure the
> diff syntax is not good
If the diff is very large, you did not run "find . -ls" at the
equivalent directories both times. Display find5.txt and find4.txt, they
should look very identical. The difference must be very hard to find.

If it is not, then obviously you did not copy identical directory
structures.

--
Cheers / Saludos,

                Carlos E. R.
                (from 42.2 x86_64 "Malachite" at Telcontar)


signature.asc (188 bytes) Download Attachment
123