experchange > scripting.vbscript

R.Wieser (11-20-19, 01:54 PM)
Hello all,

Just a heads-up:

Some time ago there was a discussion involving file.ReadAll having a problem
with files binary binary contents.

Just today I found that it also has problems with absolutily nothing. That
is, when reading an empty file. :-) It than throws a "reading past the end
of file" error.

In short, there are more reasons to evade using 'ReadAll' ...

Regards,
Rudy Wieser
Mayayana (11-20-19, 03:39 PM)
"R.Wieser" <address> wrote

| Just today I found that it also has problems with absolutily nothing.
That
| is, when reading an empty file. :-) It than throws a "reading past the
end
| of file" error.
|

Thanks. I never noticed that before. I guess it shows
once more how seriously Microsoft took IT people in 1998.
They apparently hired a handful of college students to
write scrrun.dll and assumed that no one would ever need
it for more than parsing a log file.
JJ (11-21-19, 07:00 PM)
On Wed, 20 Nov 2019 12:54:24 +0100, R.Wieser wrote:
> Hello all,
> Just a heads-up:
> Some time ago there was a discussion involving file.ReadAll having a problem
> with files binary binary contents.
> Just today I found that it also has problems with absolutily nothing. That
> is, when reading an empty file. :-) It than throws a "reading past the end
> of file" error.
> In short, there are more reasons to evade using 'ReadAll' ...
> Regards,
> Rudy Wieser


IMO, that's not a quirk. ReadAll() does what it's described:

"Returns all characters from an input stream."

It doesn't say "if any".

So, if you make it read an empty file, it'll assume that there's data in the
file, but there is none. Thus the triggered exception.
Mayayana (11-21-19, 07:50 PM)
"JJ" <jj4public> wrote

| So, if you make it read an empty file, it'll assume that there's data in
the
| file, but there is none. Thus the triggered exception.

I'd expect a zero-length string. But to be honest,
I've never really given it any thought. Typically I
add "on error resume next" to my scripts after they're
all done and polished. The reason for that is just this
kind of thing -- a silly error that's probably not going to
affect the result, and I don't want it to stop the
whole script.

In real world usage such errors are actually common.
For instance, if I want to clean my TEMP folders I'll
get errors about files in use. But if I use a script to
do it and add error trapping then it completes smoothly
and deletes everything that it can delete.

I don't think it ever occurred to me to do
something like, s = TS.ReadAll: If Len(s) > 0 then...

But with error trapping I guess the result is the same.
I typically wrap it in a small function. s = ReadFile(path).
So with error trapping in that function, for a blank
file I'll get back "".
R.Wieser (11-21-19, 09:39 PM)
JJ,

> IMO, that's not a quirk. ReadAll() does what it's described:


Nope.

> "Returns all characters from an input stream."


Correct. That would normally mean that the result would be an empty string,
not an "reading past the end of file" error.

Besides, even the error is wrong: If there is nothing to read why is it than
(forcefully) trying to do so anyway ?

> It doesn't say "if any".


Please, do *not* go the "they have not explicitily said that ..." road. You
see, the reverse is also true, with the end result being that pretty-much
anything goes, even random results that have nothing to do with the provided
filename

Heck, it would even include deleting the file or converting its contents
into runes.

> So, if you make it read an empty file, it'll assume that there's data in
> the file, but there is none. Thus the triggered exception.


Than explain to me why it can stop exactly at the end of /any/ contents, no
matter how short or long, but is too stupid to detect a string-length of
zero. That simply does not make any sense.

No kiddo, as far as I can tell you're currently just trolling.

Regards,
Rudy Wieser
Mayayana (11-21-19, 10:04 PM)
"R.Wieser" <address> wrote

| Than explain to me why it can stop exactly at the end of /any/ contents,
no
| matter how short or long, but is too stupid to detect a string-length of
| zero. That simply does not make any sense.
|

Actually, as we've found before, it doesn't seem to
do that. It assumes content without checking the file
size. Then it reads until it hits a null. I assume it's
using API like CreateFile/ReadFile (or maybe VC++
functions) to get the file into a string. Then it sees
the end of the string as the first null, unless given
a specific string length to return.

I think all you can do is say, "Thanks, Scripting Guys!"

As I understand it, they're the bright bulbs who were
tasked with creating scrrun *and wrote it in VC6++!*.
So it's a Windows system library but it has to load the
VC6++ runtime.
R.Wieser (11-21-19, 10:46 PM)
Mayayana,

> Actually, as we've found before, it doesn't seem to
> do that. It assumes content without checking the file
> size.


Not quite. Remember that it returns a string the size of the file, but
that its contents are buggy ?

> Then it reads until it hits a null.


Again, not quite. Its the copying which is done after reallocating for a
bigger blob which is the culprit. Yes, I disassembled the involved code.
:-)

And just print out all the bytes in the string one-by-one and take a look at
the ones beyond the last whole multiple of ... 260 IIRC. Compare with the
last part of the file. You will see that they match. In other words, it
does read the whole thing.

> I think all you can do is say, "Thanks, Scripting Guys!"


/That/ I do agree with you. :-)

Regards,
Rudy Wieser
Mayayana (11-21-19, 11:31 PM)
"R.Wieser" <address> wrote

| > Then it reads until it hits a null.
|
| Again, not quite. Its the copying which is done after reallocating for a
| bigger blob which is the culprit. Yes, I disassembled the involved code.
| :-)
|

Ah. I never noticed that. It does actually get the entire file.
So what's different? If I ReadAll a GIF or Read(filelen) a GIF
I apparently get the same bytes. But the former will stop at the
first null when I try to read it while the latter will not. I don't
understand what you mean by "copying after reallocating".
R.Wieser (11-22-19, 10:23 AM)
Mayayana,

> But the former will stop at the first null when I try to read


I'm not sure what you mean with "when I try to read" there. The whole file
gets processed, as is proven by the last bytes of the resulting malformed
string matching the last bytes of the file.

> If I ReadAll a GIF or Read(filelen) a GIF I apparently get the same bytes.


That fully depends on the size of the file. The readall code internally
works in increments of 260 bytes. Take a file of at least that size and the
corruption will take place.

> But the former will stop at the first null when I try to read it
> while the latter will not.


Again, /it doesn't stop/.

> I don't understand what you mean by "copying after reallocating".


The "readall" method goes (simplified) like this:

- while not at EOF
- try to read 260 bytes of data
- convert from utf-8 to widestring
- allocate a new blob of memory the size of the old one plus the size of
the wide string
- copy the contents of the old blob into the new blob <=== !!
- copy the widestring into the new blob (at an offset equal to the old blobs
size)
- wend
- return contents of the "new blob"

It goes wrong at the "copy the contents of the old blob into the new blob"
point. That copy routine stops at the first zero (leaving the remainder of
the "new blob" undefined) when it should just have copied the whole
specified block (copy X bytes from Y to Z).

Regards,
Rudy Wieser
JJ (11-22-19, 02:10 PM)
On Thu, 21 Nov 2019 20:39:52 +0100, R.Wieser wrote:
[..]
> No kiddo, as far as I can tell you're currently just trolling.
> Regards,
> Rudy Wieser


Oh, I remember...
It depends on whether the source file is a character or block/disk device.
The length of a block device is known, while a character device's is
unknown.
The exception will be thrown when the data length is known and is zero.
JJ (11-22-19, 02:13 PM)
On Thu, 21 Nov 2019 21:46:59 +0100, R.Wieser wrote:
> Mayayana,
> Not quite. Remember that it returns a string the size of the file, but
> that its contents are buggy ?
> Again, not quite. Its the copying which is done after reallocating for a
> bigger blob which is the culprit. Yes, I disassembled the involved code.
> :-)


That's a character translation problem which is related to character
encoding. It's an entirely different (buggy) matter.
Mayayana (11-22-19, 04:14 PM)
"R.Wieser" <address> wrote

| It goes wrong at the "copy the contents of the old blob into the new blob"
| point. That copy routine stops at the first zero (leaving the remainder
of
| the "new blob" undefined) when it should just have copied the whole
| specified block (copy X bytes from Y to Z).
|

I see. Thanks. I probably never would have guessed it
was multiple reads. I've seen that kind of activity before
in Filemon logs but don't understand why a file would be
read so inefficiently. Maybe it's a relic of ReadFile API?

Initially I'd had problems with file content
seeming to stop at the first null. I didn't look further because
the method was clearly unusable. But I also assumed it
was a problem of assuming string content and thus FSO looking
for a null as end of string, rather than reading a specified
number of bytes. From your description it appears that is
the problem, but in a more complicated way than I imagined.
R.Wieser (11-22-19, 04:47 PM)
JJ,

>> Again, not quite. Its the copying which is done after reallocating for a
>> bigger blob which is the culprit. Yes, I disassembled the involved
>> code.

> That's a character translation problem which is related to
> character encoding. It's an entirely different (buggy) matter.


Nope, its not. The block-copying code doesn't translate anything.

Besides, the problem doesn't appear for small (< 260 chars) files, or for
the last few bytes (filesize modulo 260) - and that includes any embedded
zeroes. If the encoding would have been the problem that would not have
been possible.

Regards,
Rudy Wieser
R.Wieser (11-22-19, 04:58 PM)
Mayayana,

> but don't understand why a file would be read so inefficiently.
> Maybe it's a relic of ReadFile API?


Its possibly related to yester-years low-memory 'puters (in comparision to
the current ones). Reading everything and only than convert costs three
times the size of the file (the data itself and the resulting wide-string
output). Than again, the rather inefficient memory management (adding
bite-sized parts) can't be good either ...

> From your description it appears that is the problem, but in a
> more complicated way than I imagined.


:-) Yep.

Regards,
Rudy Wieser
Mayayana (11-22-19, 05:10 PM)
"R.Wieser" <address> wrote

| > That's a character translation problem which is related to
| > character encoding. It's an entirely different (buggy) matter.
|
| Nope, its not. The block-copying code doesn't translate anything.
|

That brings up another issue. UTF-8 was pretty much
unknown, maybe non-existent, when scrrun came out.
What FSO does is, I think, what VB6 does: Externally
it defaults to codepage ANSI while internally it stores
strings as unicode. (The help file says it defaults to ASCII,
but actually it's ANSI.)

That explains what you saw. Bulking up to a unicode
string as it reads in. But I very much doubt something
like 80 CE 32 would be translated to the u-16 equivalent.
It just comes through (fortunately) as 3 ANSI characters.