It's something I am working on.

Basically, imagine I have thousands of entries of the data as in my attachment. I am attempting to predict if you will have a stroke before you are 40 (crazy example, totally made up data).

So to set up a logistic regression I know how to many binary variables for owning a pet, gender and college grad or not. I understand that part of the math.

What I don't understand is how to handle continuous variables like "hours exercised per week."

For example, what if we knew that the more hours you exercised the less likely you were to get a stroke? And it was constantly linear? So for the sake of argument, what if exercising 40, 60, 80 hours a week just kept lowering your chance of a stroke at the same rate (crazy example I know just assume with me)?

Well, in that case, I can see how the coefficient my logistic regression would produce for "hours exercised per week" would make sense.

But a more likely scenario is that exercising a few hours a week is very good for reducing stroke, but then if you hit a point where the more you exercise, the more it actually INCREASES your chance of stroke (again, made up example, but lets suppose we know its true).

So when we know (or suspect) that a continuous variable like "hours exercised per week" is nonlinear, how does this affect our logistic regression? I assume it would make it less accurate at predicting than if the continuous variable was linear.

How can we structure our logistic regression model in such a situation? I read something about transforming the non-linear independent variable.

Does the linearity of the continuous variable in a logistic regression even matter?

There are the kinds of questions I have. I know its a a lot, but I can't find a good answer or explanation of how to account and think about non-linear independent variables in a logistic regression.

Thank you all for reading/helping.:tup: