Query for databases not scraped properly?
See original GitHub issueHi,
with this yml configuration file:
#
# **LDAP2PG SAMPLE CONFIGURATION**
#
# This is a sample starting point configuration file for ldap2pg.yml. Including
# static roles, groups, privilege and LDAP query.
#
# This configuration assumes the following principles:
#
# - All LDAP users are grouped in `ldap_roles` group.
# - Read privileges are granted to `readers` group.
# - Write privileges are granted to `writers` group.
# - DDL privileges are granted to `owners` group.
# - We have one or more databases with public and maybe a schema.
# - Grants are not specific to a schema. Once you're writer in a database, you
# are writer to all schemas in it.
#
# Adapt to your needs! See also full documentation on how to configure ldap2pg
# at https://ldap2pg.readthedocs.io/en/latest/config/.
#
verbosity: 5
ldap:
#HA ldap handling
uri: "ldap://ldap.example.com"
binddn: uid=testuser,cn=users,cn=accounts,dc=example,dc=com
password: password
postgres:
dsn: postgres://user:password@demodb.xxxxxxxxxxxxxxx.amazonaws.com:5432/demodb
# Scope the database where to purge objects when dropping roles. This is the
# scope of grant on `__all__` databases.
#databases_query: [postgres, appdb, olddb]
database_query: |
SELECT datname FROM pg_database
WHERE datallowconn IS TRUE
AND datname != 'rdsadmin' AND datname != 'template0' AND datname != 'template1';
# List of managed schema. This skip pg_toast, pg_temp1, etc. but not pg_catalog.
schemas_query: |
SELECT nspname FROM pg_catalog.pg_namespace
WHERE nspname NOT LIKE 'pg_%' AND nspname NOT LIKE 'information_schema';
# Return managed roles which can be dropped or revoked.
managed_roles_query: |
SELECT 'public'
UNION
SELECT DISTINCT role.rolname
FROM pg_roles AS role
LEFT OUTER JOIN pg_auth_members AS ms ON ms.member = role.oid
LEFT OUTER JOIN pg_roles AS ldap_roles
ON ldap_roles.rolname = 'ldap_roles' AND ldap_roles.oid = ms.roleid
WHERE role.rolname IN ('ldap_roles', 'readers', 'writers', 'owners')
OR ldap_roles.oid IS NOT NULL
ORDER BY 1;
# Since readers/writer/owners groups are globals, we have a global
# owners_query.
owners_query: |
SELECT DISTINCT role.rolname
FROM pg_catalog.pg_roles AS role
JOIN pg_catalog.pg_auth_members AS ms ON ms.member = role.oid
JOIN pg_catalog.pg_roles AS owners
ON owners.rolname = 'owners' AND owners.oid = ms.roleid
ORDER BY 1;
privileges:
# Define an privilege group `ro` with read-only grants
ro:
- __connect__
- __execute__
- __select_on_tables__
- __select_on_sequences__
- __usage_on_schemas__
- __usage_on_types__
# `rw` privilege group lists write-only grants
rw:
- __temporary__
- __all_on_tables__
- __all_on_sequences__
# `ddl` privilege group lists DDL only grants.
ddl:
- __create_on_schemas__
sync_map:
# First, setup static roles and grants
- roles:
- names:
- ldap_roles
- readers
options: NOLOGIN
comment: Custom static comment.
- name: writers
# Grant reading to writers
parent: readers
options: NOLOGIN
- name: owners
# Grant read/write to owners
parent: writers
options: NOLOGIN
# Now grant privileges to each groups
grant:
- privilege: ro
role: readers
# Let's everyone see pg_catalog
schema: __all__
- privilege: rw
role: writers
# But avoid writers to write in pg_catalog
schema: public
# Allow ddl to create tables in public only
- privilege: ddl
role: owners
schema: public
# owners must have write access to pg_catalog
- privilege: rw
role: owners
schema: pg_catalog
# Grants on specific schema appdb.appns:
- privilege: rw
role: writers
database: appdb
schema: appns
- privilege: ddl
role: owners
database: appdb
schema: appns
# Now query LDAP to create roles and grant them privileges by parenting.
- ldap:
base: cn=groups,cn=accounts,dc=example,dc=com
filter: "(cn=dba)"
role:
name: '{member.cn}'
options: LOGIN SUPERUSER
parent:
- ldap_roles
- owners
comment: "Custom comment from LDAP: {dn}"
- ldap:
base: cn=groups,cn=accounts,dc=example,dc=com
filter: "(cn=app*)"
role:
name: '{member.cn}'
options: LOGIN
parent:
- ldap_roles
- writers
on_unexpected_dn: warn
- ldap:
base: cn=groups,cn=accounts,dc=example,dc=com
filter: |
(&
(cn=bi)
(objectClass=*)
)
role:
name: '{member.cn}'
options: LOGIN
parent:
- ldap_roles
- readers
I got a connection attempt on “rdsadmin” database which should be exlude by my databases_query.
Error message:
[ldap2pg.script ERROR] Unhandled error:
[ldap2pg.script ERROR] Traceback (most recent call last):
[ldap2pg.script ERROR] File "/usr/local/lib/python3.5/dist-packages/ldap2pg/script.py", line 94, in main
[ldap2pg.script ERROR] exit(wrapped_main(config))
[ldap2pg.script ERROR] File "/usr/local/lib/python3.5/dist-packages/ldap2pg/script.py", line 70, in wrapped_main
[ldap2pg.script ERROR] count = manager.sync(syncmap=config['sync_map'])
[ldap2pg.script ERROR] File "/usr/local/lib/python3.5/dist-packages/ldap2pg/manager.py", line 236, in sync
[ldap2pg.script ERROR] schemas = self.inspector.fetch_schemas(databases, ldaproles)
[ldap2pg.script ERROR] File "/usr/local/lib/python3.5/dist-packages/ldap2pg/inspector.py", line 233, in fetch_schemas
[ldap2pg.script ERROR] for dbname, psql in self.psql.itersessions(databases):
[ldap2pg.script ERROR] File "/usr/local/lib/python3.5/dist-packages/ldap2pg/psql.py", line 73, in itersessions
[ldap2pg.script ERROR] with self(dbname) as session:
[ldap2pg.script ERROR] File "/usr/local/lib/python3.5/dist-packages/ldap2pg/psql.py", line 142, in __enter__
[ldap2pg.script ERROR] self.conn = psycopg2.connect(self.connstring)
[ldap2pg.script ERROR] File "/usr/local/lib/python3.5/dist-packages/psycopg2/__init__.py", line 130, in connect
[ldap2pg.script ERROR] conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
[ldap2pg.script ERROR] psycopg2.OperationalError: FATAL: pg_hba.conf rejects connection for host "X.X.X.X", user "username", database "rdsadmin", SSL on
[ldap2pg.script ERROR] FATAL: pg_hba.conf rejects connection for host "X.X.X.X", user "username", database "rdsadmin", SSL off
[ldap2pg.script ERROR] Please file an issue at https://github.com/dalibo/ldap2pg/issues with full log.
Trying to understand your code, adding some logging in python3.5/dist-packages/ldap2pg/inspector.py
29 class PostgresInspector(object):
30 def __init__(
31 self, psql=None, privileges=None, roles_blacklist=None,
32 shared_queries=None, **queries):
33 self.psql = psql
34 self.privileges = privileges or {}
35 self.shared_queries = shared_queries or {}
36 self.queries = queries
37 msg = ("MARKER {q} {c}").format(q=type(self.queries), c=self.queries) <-- Added line
38 logging.error(msg) <-- Added line
Error message output:
[root ERROR] MARKER <class 'dict'> {'all_roles': 'SELECT\n role.rolname, array_agg(members.rolname) AS members, {options}\nFROM\n pg_catalog.pg_roles AS role\nLEFT JOIN pg_catalog.pg_auth_members ON roleid = role.oid\nLEFT JOIN pg_catalog.pg_roles AS members ON members.oid = member\nGROUP BY role.rolname, {options}\nORDER BY 1;\n', 'databases': 'SELECT datname FROM pg_catalog.pg_database\nWHERE datallowconn IS TRUE ORDER BY 1;\n', 'managed_roles': "SELECT 'public'\nUNION\nSELECT DISTINCT role.rolname\nFROM pg_roles AS role\nLEFT OUTER JOIN pg_auth_members AS ms ON ms.member = role.oid\nLEFT OUTER JOIN pg_roles AS ldap_roles\n ON ldap_roles.rolname = 'ldap_roles' AND ldap_roles.oid = ms.roleid\nWHERE role.rolname IN ('ldap_roles', 'readers', 'writers', 'owners')\n OR ldap_roles.oid IS NOT NULL\nORDER BY 1;\n", 'owners': "SELECT DISTINCT role.rolname\nFROM pg_catalog.pg_roles AS role\nJOIN pg_catalog.pg_auth_members AS ms ON ms.member = role.oid\nJOIN pg_catalog.pg_roles AS owners\n ON owners.rolname = 'owners' AND owners.oid = ms.roleid\nORDER BY 1;\n", 'schemas': "SELECT nspname FROM pg_catalog.pg_namespace\nWHERE nspname NOT LIKE 'pg_%' AND nspname NOT LIKE 'information_schema';\n"}
The query seems to be the default one defined in the config.py (lines 348-350):
'databases': 'SELECT datname FROM pg_catalog.pg_database\nWHERE datallowconn IS TRUE ORDER BY 1;
Am I missing something here?
Other parameters in the conf file seem to be scraped correctly.
And by the way, why are you doing your own mapping and not using something like: https://pyyaml.org/wiki/PyYAML ?
Thanks in advance.
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
scraped items not being saved into database - Stack Overflow
my scrapy not saving data into database. please suggest. it is ... item, spider): # run db query in thread pool query =...
Read more >Step by Step Guide on Scraping Data from a Website and ...
Scraping data online is something every business owner can do to create a copy of a competitor's database and analyze the data to...
Read more >How I save my Scraped Data to a Database with Python ...
We've focused on how to scrape content but not on how to save it persistently - I'll show you how I save my...
Read more >Storing Scraped Data - Finxter
The purpose of this article is to educate you on how to store scraped content from websites into a file or database. The...
Read more >Web Scraping Basics - Towards Data Science
You can see that Google does not allow web scraping for many of its ... Approach 2: If website stores data in API...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@bersace Nice work. Thanks for the great follow-up. Closing this.
Sure thing, I’ll let the ticket opened until then, and let you know about my setup experience. 😉