Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise (some) exceptions by default #329

Open
0x4c47 opened this issue Jan 27, 2023 · 2 comments
Open

Raise (some) exceptions by default #329

0x4c47 opened this issue Jan 27, 2023 · 2 comments

Comments

@0x4c47
Copy link

0x4c47 commented Jan 27, 2023

Exceptions raised within feedparser are currently caught internally and an empty/half-parsed result is returned with the exception in the bozo_exception attribute.

To be honest: I personally think this is not a good design choice for default behavior in Python. I think by default exceptions should be raised for me to be handled. It's confusing because most other librarie do it differently and raise exceptions instead of just returning an empty/incomplete result with special exception attributes.

I understand that the "half-parsing" part could be required in some cases. But that could be something enabled by a parameter instead of doing it by default? Or just include the half-parsed feed in the exception?

E.g. I think for a network timeout error (feedparser doesn't even get any data to parse because the URL is unreachable) it makes no sense at all to return an empty result with a wrapped exception by default. Instead the timeout error should be raised as an exception.

The documentation is also not very clear about this: It suggests that bozo is only about malformed feeds. But in fact it's about all exceptions.

If this topic was already discussed, feel free to close this issue immediately. But I couldn't find anything about this here.

@kurtmckee
Copy link
Owner

I 100% agree. I'll consider places where raising exceptions makes sense.

@dechamps
Copy link

dechamps commented Jul 1, 2024

E.g. I think for a network timeout error (feedparser doesn't even get any data to parse because the URL is unreachable) it makes no sense at all to return an empty result with a wrapped exception by default. Instead the timeout error should be raised as an exception.

This. Quite frankly this code should never have been written:

except urllib.error.URLError as error:
result.update(
{
"bozo": True,
"bozo_exception": error,
}
)
return result

All this does is turn a critical error into an incorrect, misleading result (a feed with no entries). This is not helpful at all and caused quite a bit of confusion when I was trying to find out why my script successfully completed without processing any entries.

The current behavior is extremely surprising. Catastrophic errors where the feed cannot be read, or cannot be parsed at all, should definitely raise an exception to the caller.

This "bozo" error reporting mechanism should only be used for partial failures, e.g. where some data was recovered from the feed despite it being malformed. (Exceptions are not a good fit for this case since a function can't return some data and raise an exception at the same time.) It should not be used for total failures.

The documentation is also not very clear about this: It suggests that bozo is only about malformed feeds. But in fact it's about all exceptions.

This too. The documentation is quite confusing - most people would not think of an unreachable URL as a feed with "malformed XML"!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants