Archive: NSISdl failing while resolving hostname


NSISdl failing while resolving hostname
When using NSISdl::download, I notice that while using the same URL, about 10% of the time, it returns the failure message of "resolving hostname".

I was thinking one thing I could do is put the NSISdl::download call in a loop so that it attempts a connection...say...5 times, or until it gets something other than "resolving hostname" as a response. Does that sound like a decent solution? Or is there another approach that would work better?


Or is there another approach that would work better?
10% sounds strange, but may be InetLoad

I took a stab at using InetLoad, and it worked. Plus, it was surprisingly easy to implement (I only had to change NSISdl::download to InetLoad::load).

Man I love NSIS.


I have the same problem. Does anyone know how to solve it?

InetLoad seems to be too unstable, it often crashes when cancelling a download.


Joost, can you give me a script sample with 'repeatable' crash? I'd like to fix situation, but this never happens on my comps (W2K, Win98, XP, Win2003). And I saw the only report about 'old style' (NSISdl display mode and it's source code fragment) crash, but I could not reproduce this as well. Or may be you can run plug-in under debugger - this may point to bug source. Thanks in advance.


It's probably a buffer overflow or something like that, so a sample script won't help. I will try to find the issue but it may take a few weeks.

The same bug has been reported before:

http://nsis.sourceforge.net/Talk:InetLoad_plug-in

NSISdl should still be fixed as well because it's the only plug-in we have that works on all Windows versions independent of Internet Explorer.


What kind of URLs looks unstable for NSISdl?


The problem with NSISdl is that it often gives a "resolving hostname" error immediately. That's with standard HTTP URLs (NSISdl doesn't support FTP).

I also got a crash once when cancelling a NSISdl download. Do NSISdl and InetLoad share some code?


I've always used InetLoad and never had an issue nor anyone reporting of one. I used it my SwUpdata software so it would be used at least 10-50 times per update session.

NSISdl on the other had caused some real issues for some people with a firewall or proxy server.

-Stu


Yes, InetLoad uses NSISdl code fragment in the "old style" mode (progress bar on the installer window). I wrote about this above (italic text :) ). Last crash report http://forums.winamp.com/showthread....ad#post1955550 noted this mode only. And for me this looks like a sync problem after Installer window Cancel handling (not used in other modes - Popup and Banner). May be a simple Sleep can solve the problem in the ParentWndProc() before PostMessage(hDlg, WM_COMMAND, IDCANCEL, 0); I can't test this - not repeats on my comps.
NSISdl - in most cases, valid URL resolve failue caused by short timeouts, some error handling like this may help in the asyncdns.cpp, _threadfunc()


...
hostentry=::gethostbyname(_this->m_hostname);
if(!hostentry && GetLastError() == WSATRY_AGAIN)
hostentry=::gethostbyname(_this->m_hostname);
if (hostentry)
...

Using a sleep command is never a good solution. What kind of race condition do you think there might be?


The only InetLoad additional commands in the NSISdl display mode are SetWindowLong(WndProc) subclassings for Cancel handling. While it handles Cancel asynchronously, i.e. it sets flag and attempts to stop or terminate transfer, some weird behavior may occur this moment.
Attached is InetLoad version with some improvements, I tested on Cancel 20 times (dial-up) – no crashes.
If my "WSA TRY AGAIN" changes looks correct, we also can create NSISdl script with 100 short files download and test current NSISdl status, I can do this from office LAN and home dial-up.


I have not yet been able to reproduce the crash on my current system so I cannot tell whether your new one makes any difference. I will continue testing.


For now I suggest you update both InetLoad and NSISdl to fix this unsafe code. If someone gets a crash again we can continue to search for issues.

Can you also post a compiled NSISdl with the WSATRY_AGAIN change? This NSISdl bug is easy to reproduce, it fails about 5-10% of all times.


I added retries count to gethostbyname and on 50 runs suddenly found 49 successes on first attempt and 1 success on second. But final plug-in result was a resolve failue on 20% runs! So I moved my point of interest to dns thread syncronization, this may take more time. Current state attached.


I added mutex for gethostbyname sync and now cannot reproduce name resolve problem. But may be IPs cached somewhere? Test version attached.


I think Windows does have some caching for hostnames, is that relevant for this bug?

Thanks for trying to solve this.


Tested on the fresh comp (office LAN, fast connection), plug-in resolve problem now not reproduces, I guess it is fixed, but extended test still required. urls.txt filling with 4 - 20 kB files links from all over the world may be very usefull for final testing (up to 100 links if we are talking about percents). Please note that NSISdl supports static files only (server reports content length and IE displays file size in the File->Properties).
About included screenshot: www.qstar.com is not available now from IE and ping cannot resolve name to IP, so failue (and reason) is correct. On longer timeouts (120 sec) it reports resolve error after 3 gethostbyname attempts. DNS loop breaks if attempts count or timeout value exceeded.
Finally 2 files updated compare to last cvs version - asyncdns.h and asyncdns.cpp (attached).


That's great news. I'll try to find some time to test it soon.


So the problem is only with timeouts of the DNS server? All that is needed is to try again?

About the mutex, it is named and therefore global. That could pose a problem with multiple installers running NSISdl simultaneously. A critical section should be used. It's faster too.


No, as I wrote above DNS timeouts appeared to take 1-2% of all errors only in my tests. 98-99% were caused by some strange sync problem - gethostbyname() thread reports 'success', but main thread gets 'unresolvable'. The only idea I have about possible reason is a Windows strange behaviour on short sleeps (Sleep(10) was used in main loop).
Sections looks better, OK, I changed the code. I also reduces max DNS attempts count to 2 - this worked correct even with misconfigured DNS servers in our LAN.


I found out what the problem was. It was indeed a threading issue.

  1. Main loop asks for name resolution.
  2. A thread is created for the asynchronous operation.
  3. Main loop keeps querying the name resolution status.
  4. Main loop calls resolve().
  5. resolve() sees m_addr is still NULL just before its time slice ends.
  6. The name resolution thread gets a time slice, finishes the resolution and sets m_thread_kill to 1.
  7. resolve() goes back into action, sees m_thread_kill is 1 and tries to create a new thread.
  8. resolve() kills itself because m_thread still holds a valid thread handle.
  9. User gets a name resolution error even though the name was resolved.
This test for m_thread before creating a new thread is a bit problematic because m_thread is only reset in the destructor. To solve this in the simplest manner, I've removed m_thread_kill and added a member function that waits for the thread to die and resets m_thread. Once m_addr gets a result, the code will now call this function. The thread creation code will not create the thread until m_thread is NULL. This way, the class is now truly reusable and the issue above is solved because the test that caused it is no longer needed.

Thanks for your help.