Archive: Text File Search Help


Text File Search Help
Hello all.

I need a hand with a text file search project I'm working on. I want to be able to check text files for duplicate field ID numbers. The ID number will always come 1 line after the term "EditBox" or "CheckBox". EditBox and CheckBox will always be the only item on their lines. So for example:


EditBox
1

EditBox
2

EditBox
2

EditBox
3

My plan was to read a text file until I get to either EditBox of CheckBox, then read the next line (ID Number) and check to see if it is already in my Array, if so, I'll note that ID number as a duplicate, and continue to check for others. If it is not already in the Array, I'll add it, and again continue checking.

I tried to do this using the "LineRead" function off the WIKI, reading every line, checking if it was EditBox or CheckBox, and if so reading the next line. However this was very slow, as I'm sure this was not the best way to do this. Most of the files I want to search will have between 4000-8000 lines of text. I would like to be able to search of folder of 50-100 files in a few minutes or less.

I just tried using AfrowUK's FileSearch function, and that is able to search a 9000 line file in a second or two, and give me the number of times EditBox appears, so I think all this can be done, just need a hand modifying one of these function for my purpose. Any help would be much appreciated. Thanks all.

Jnuw

If the LineRead function is slow then that probably means it loops through the entire file each function call to the line that you want.

It really is easy to write your own function to read from a file. Using already existing functions in pieces or multiple times is not a good idea in this case. You need to write your own code so that it is as effecient as possible.

Are you using NSISArray for this?

-Stu


Name "Output"
OutFile "Output.exe"

!include "TextFunc.nsh"
!insertmacro TrimNewLines

Section
ClearErrors
FileOpen $0 "C:\input.txt" r
IfErrors end

loop:
FileRead $0 $1
IfErrors close
${TrimNewLines} "$1" $1
StrCmp $1 EditBox +2
StrCmp $1 CheckBox 0 loop

FileRead $0 $2
IfErrors close
${TrimNewLines} "$2" $2
#
MessageBox MB_OK "$$1={$1}$\r$\n$$2={$2}"
#
goto loop

close:
FileClose $0

end:
SectionEnd

If you want it to be even more efficient, you could remove the first TrimNewLines call and compare $1 to "EditBox$\r$\n" and "CheckBox$\r$\n". This should be fine if you are sure the file will always contain Windows new lines (and not UNIX for example), but otherwise you can leave it in.

-Stu


Hello guys. Thanks for your help, sorry I have been slow to respond. I found something very strange, at least to me. I actually was using some very similar code to what Instructor posted, but mine took over 30 seconds to do what Instructor's could do in 2.5 seconds.

So I copied Instructor's right into my script, and now his code took over 30 seconds. What I found, is that running the same code using the MUI takes 33 seconds, where running it from a non-MUI exe takes only 3 seconds. Here are the 2 nsi files:


Name "NoMUI"
OutFile "NoMUI.exe"

!include "TextFunc.nsh"
!insertmacro TrimNewLines

Section
ClearErrors
FileOpen $0 "test.txt" r
IfErrors end

loop:
FileRead $0 $1
IfErrors close
StrCmp $1 "EditBox$\r$\n" +2
StrCmp $1 "CheckBox$\r$\n" 0 loop

FileRead $0 $2
IfErrors close
${TrimNewLines} "$2" $2
StrCpy $3 "$3, $2"
goto loop

close:
FileClose $0

end:
MessageBox MB_OK "$3"
SectionEnd


!include "MUI.nsh"

!include "TextFunc.nsh"
!insertmacro TrimNewLines

!insertmacro MUI_PAGE_INSTFILES
!insertmacro MUI_PAGE_FINISH
!insertmacro MUI_LANGUAGE "English"

Name "MUI"
OutFile "MUI.exe"

Section ""

ClearErrors
FileOpen $0 "test.txt" r
IfErrors end

loop:
FileRead $0 $1
IfErrors close
StrCmp $1 "EditBox$\r$\n" +2
StrCmp $1 "CheckBox$\r$\n" 0 loop

FileRead $0 $2
IfErrors close
${TrimNewLines} "$2" $2
StrCpy $3 "$3, $2"
goto loop

close:
FileClose $0

end:
MessageBox MB_OK "$3"

SectionEnd


I have attached these 2 NSI files, along with Test.txt which is a sample text file of 9000 lines, with the "EditBox" string intermingled within. Let me know if I have something wrong, but this seems odd to me. I would understand the MUI taking longer to load pages, but the section code shouldn't take longer to run should it?

Thanks all!

The answer is:
XPStyle on

Put XPStyle on in the NonMUI script and you'll find that it then takes as long as the MUI script.
To be honest though I'm really not sure why it takes longer with XPStyle on... I'll try some more things.

-Stu


Ah here's the deal. It's because for every instruction it has to move the progress bar position. With XPStyle on due to the greater amount of visual stuff it takes that little time longer. Times it by how many lines you have in the text file times again by how many instructions exist in the loop and you'll get a great time increase!

To fix this though is very easy :). Just move the entire code from within the Section into a seperate Function and Call the Function. You'll find that it now takes about 1 or 2 seconds because NSIS only has to move the progress bar once when it calls the function.

I think this has uncovered the false 'myth' that NSIS code is extremely slow. It's not so it appears... it's the damn progress bar that we can't disable that is the culprit.

-Stu


Afrow UK, yup, you got it. I get the same results. Nice work, thank you so much for your help on this one, I thought I was going crazy. Thanks again!


Glad to be of service :)

-Stu